1. Topics
There are log 1.log, part of the following:
112.111.12.248 - [25/Sep/2013:16:08:31 +0800]formula-x.haotui.com "/seccode.php?update=0.5593110133088248" 200"http://formula-x.haotui.com/registerbbs.php" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;)"61.147.76.51 - [25/Sep/2013:16:08:31 +0800]xyzdiy.5d6d.com "/attachment.php?aid=4554&k=9ce51e2c376bc861603c7689d97c04a1&t=1334564048&fid=9&sid=zgohwYoLZq2qPW233ZIRsJiUeu22XqE8f49jY9mouRSoE71" 301"http://xyzdiy.×××thread-1435-1-23.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
Please count the number of accesses per IP?
2. Topic analysis
According to the contents of the log, you can see that the IP address is the first paragraph of the content, so just the first paragraph of 1.log to filter out, and then further statistics on the number of each IP.
Filter the first paragraph, use awk, and count the number of accesses per IP to sort and then calculate the quantity, sort using the sort command, and count the traffic for each IP with Uniq.
3. Specific Shell commands
This problem, a command with a shell script is sufficient:
awk ‘{print $1}‘ 1.log | sort -n | uniq -c | sort -n
Explain:
- The awk command is advantageous in terms of fragmentation, where {print} prints the first paragraph, and awk can specify the delimiter with-F, and if you do not specify a delimiter, the default is a blank character (such as a space, tab, and so on), where the IP address is the first paragraph.
- The sort command is sorting, and the-N option means sorting in numbers. If you do not add-N, it is sorted in ASCII, and the IP address of the subject is sorted more easily by numbers.
- The Uniq command is used to repeat a text that, if more than one line of content is identical, uses the Uniq command to delete the same content, leaving only a row. The-C option is to calculate the number of repeated rows. Therefore, the role of uniq-c is precisely the number of IP traffic statistics. However, be aware that Uniq is important after sorting.
- The last sort-n means to sort by the size of the traffic, and the higher the number of requests, the more the IP is behind. If you add a-r option, SORT-NR is sorted in reverse order.
4. Conclusion
There is another solution to the problem, and it will be updated tomorrow.
Daily shell scripting Exercises (02)