Linux sort uniq awk head completes the access log statistics sorting function, uniqawk

Source: Internet
Author: User

Linux sort uniq awk head completes the access log statistics sorting function, uniqawk

We often collect some access logs during development. The URLs in the access logs are massive and many of them are duplicated. Take the url as an example. Count the top 5 URLs that appear frequently in the url and sort them in descending order.

Linux Command:Cat url. log | sort | uniq-c | sort-n-r-k 1-t ''| awk-F' //'' {print $2} '| head-5

Now let's analyze the meanings of these command combinations one by one.

0) access log sample


1)Cat t1.log


Sorts the content in the data file. The sort command sorts the content of each row according to the Lexicographic Order (ASCII Code). This ensures that duplicate records are adjacent to each other.


2) cat t1.log | sort | uniq-c


Here, the command output in the left part is used as the input in the right part through the pipeline (|. Uniq-c Indicates merging adjacent duplicate records and counting the number of duplicates. Because uniq-c Only merges adjacent records, you must sort the records before using this command.

3)Cat t1.log |Sort | uniq-c | sort-k 1-n-r

After uniq-c processing, the data format is like "2 data". The first field is a number, indicating the number of repeated records. The second field is the record content. We will sort this content. Sort-k 1 indicates sorting the first field in each row. This indicates the field representing the number of repeated records. Because the default sorting of the sort command is based on ASCII, this will lead to sorting by size to small, the value 2 will be placed before the value 11, therefore, you need to use the-n parameter to specify the sort command to sort by numerical value. -R indicates the reverse order, that is, the order is sorted in ascending order.


4)Cat t1.log |Sort| Uniq-c | sort-k 1-n-r | awk-F' // ''{print $2 }'


The text processed by sort data | uniq-c | sort-k 1-n-r is in the format of http: // 192.168.1.100. The result is in the format of 192.168.1.100, you need to remove the http: // fields and use awk for processing. awk-F' // indicates that http: // 192.168.1.100 is composed of two parts: http: // and 192.168.1.100, {print $2} is used to obtain the second part of the array, that is, 192.168.1.100.


5)Cat t1.log |Sort| Uniq-c | sort-k 1-n-r | awk-F' // ''{print $2} '| head-5


The head command indicates the first x lines of the selected text. You can use head-5 to obtain the content of the first five rows in the sorting result.


6) The preceding log format, such as http: // 192.168.1.100, is a simple example. The actual format is complicated, for example,/Get http: // 192.168.1.100/auth, we can use the awk-F or cut-k Command to intercept the url. For more information, see How to Use the awk and cut commands.


References: http://www.linuxidc.com/Linux/2011-01/31347.htm


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.