Linux sort uniq awk head completes the access log statistics sorting function, uniqawk
We often collect some access logs during development. The URLs in the access logs are massive and many of them are duplicated. Take the url as an example. Count the top 5 URLs that appear frequently in the url and sort them in descending order.
Linux Command:Cat url. log | sort | uniq-c | sort-n-r-k 1-t ''| awk-F' //'' {print $2} '| head-5
Now let's analyze the meanings of these command combinations one by one.
0) access log sample
1)Cat t1.log
Sorts the content in the data file. The sort command sorts the content of each row according to the Lexicographic Order (ASCII Code). This ensures that duplicate records are adjacent to each other.
2) cat t1.log | sort | uniq-c
Here, the command output in the left part is used as the input in the right part through the pipeline (|. Uniq-c Indicates merging adjacent duplicate records and counting the number of duplicates. Because uniq-c Only merges adjacent records, you must sort the records before using this command.
3)Cat t1.log |Sort | uniq-c | sort-k 1-n-r
After uniq-c processing, the data format is like "2 data". The first field is a number, indicating the number of repeated records. The second field is the record content. We will sort this content. Sort-k 1 indicates sorting the first field in each row. This indicates the field representing the number of repeated records. Because the default sorting of the sort command is based on ASCII, this will lead to sorting by size to small, the value 2 will be placed before the value 11, therefore, you need to use the-n parameter to specify the sort command to sort by numerical value. -R indicates the reverse order, that is, the order is sorted in ascending order.
4)Cat t1.log |Sort| Uniq-c | sort-k 1-n-r | awk-F' // ''{print $2 }'
The text processed by sort data | uniq-c | sort-k 1-n-r is in the format of http: // 192.168.1.100. The result is in the format of 192.168.1.100, you need to remove the http: // fields and use awk for processing. awk-F' // indicates that http: // 192.168.1.100 is composed of two parts: http: // and 192.168.1.100, {print $2} is used to obtain the second part of the array, that is, 192.168.1.100.
5)Cat t1.log |Sort| Uniq-c | sort-k 1-n-r | awk-F' // ''{print $2} '| head-5
The head command indicates the first x lines of the selected text. You can use head-5 to obtain the content of the first five rows in the sorting result.
6) The preceding log format, such as http: // 192.168.1.100, is a simple example. The actual format is complicated, for example,/Get http: // 192.168.1.100/auth, we can use the awk-F or cut-k Command to intercept the url. For more information, see How to Use the awk and cut commands.
References: http://www.linuxidc.com/Linux/2011-01/31347.htm