This afternoon, I was busy writing a small script one afternoon. There were a lot of errors in the middle. I found the details about sort and uniq, and I found some minor mistakes in writing the script myself, programming is a strictly prohibited learning. It must be a word. Otherwise, the result is incorrect, and the program cannot run if it is heavy. Now, let's summarize these issues!
This afternoon's script:
Write a script:
1. download the file ftp: // 192.168.0.254/pub/files/access_log to the/tmp directory;
2. Analyze and display the top five IP addresses at the beginning of the line in the/tmp/access_log file that appear the most frequently, and describe how many times each IP address appears;
3, take out the/tmp/access_log file with http: //, followed by a domain name or IP address string, such as: http://www.linux.com/install/images/style.css of the http://www.linux.com of this string; the first five most frequently displayed items are displayed;
Requirements: 2nd and 3 functions are implemented in the form of functions;
# The file information of aceess_log is as follows:
Plewebkit/534.13 (khtml, like gecko) Chrome/9.0.597.107 Safari/534.13"
192.168.0.191--[24/JUL/2011: 17: 43: 17 + 0800] "Get/static/image/cr180_dzx // scrolltop.gif HTTP/1.1" 304-"http://www.linux.com/forum.php" "Mozilla/5.0 (windows; U; Windows NT 5.1; en-US) applewebkit/534.13 (khtml, like gecko) Chrome/9.0.597.107 Safari/534.13"
192.168.0.191--[24/JUL/2011: 17: 43: 17 + 0800] "Get/uc_server/images/noavatar_small.gif HTTP/1.1" 304-"http://www.linux.com/forum.php" "Mozilla/5.0 (windows; U; Windows NT 5.1; en-US) applewebkit/534.13 (khtml, like gecko) Chrome/9.0.597.107 Safari/534.13"
192.168.0.191--[24/JUL/2011: 17: 43: 17 + 0800] "Get/favicon. ico http/1.1 "304-"-"" Mozilla/5.0 (windows; U; Windows NT 5.1; en-US) applewebkit/534.13 (khtml, like gecko) chrome/9.0.597.107 Safari/534.13"
192.168.0.191--[24/JUL/2011: 17: 43: 17 + 0800] "Get/forum. php http/1.1 "200 17354" http://www.linux.com/group.php "" Mozilla/5.0 (windows; U; Windows NT 5.1; en-US) applewebkit/534.13 (khtml, like gecko) chrome/9.0.597.107 Safari/534.13"
192.168.0.191--[24/JUL/2011: 17: 43: 17 + 0800] "Get/data/Cache/style_2_common.css? O4r HTTP/1.1 "304-" http://www.linux.com/forum.php "" Mozilla/5.0 (windows; U; Windows NT 5.1; en-US) applewebkit/534.13 (khtml, like gecko) chrome/9.0.597.107 Safari/534.13"
192.168.0.191--[24/JUL/2011: 17: 43: 17 + 0800] "Get/data/Cache/style_2_forum_index.css? O4r HTTP/1.1 "304-" http://www.linux.com/forum.php "" Mozilla/5.0 (windows; U; Windows NT 5.1; en-US) applewebkit/534.13 (khtml, like gecko) chrome/9.0.597.107 Safari/534.13"
192.168.0.191--[24/JUL/2011: 17: 43: 17 + 0800] "Get/static/JS/common. js? O4r HTTP/1.1 "304-" http://www.linux.com/forum.php "" Mozilla/5.0 (windows; U; Windows NT 5.1; en-US) applewebkit/534.13 (khtml, like gecko) chrome/9.0.597.107 Safari/534.13"
192.168.0.191--[24/JUL/2011: 17: 43: 17 + 0800] "Get/static/image/cr180_dzx // bg.jpg HTTP/1.1" 304-"http://www.linux.com/forum.php" "Mozilla/5.0 (windows; U; Windows NT 5.1; en-US) applewebkit/534.13 (khtml, like gecko) Chrome/9.0.597.107 Safari/534.13"
192.168.0.191--[24/JUL/2011: 17: 43: 17 + 0800] "Get/static/image/DIY/panel-toggle.png HTTP/1.1" 304-"http://www.linux.com/forum.php" "Mozilla/5.0 (windows; U; Windows NT 5.1; en-US) applewebkit/534.13 (khtml, like gecko) Chrome/9.0.597.107 Safari/534.13"
Our goal is to capture the specified information and rank, so we must use regular expressions and three long regular expressions, which have certain requirements for writing regular expressions, when I was writing these three regular expressions, I encountered many problems. According to the requirements of the question, I used the SED command to capture and replace the specified content. However, I encountered the following error:
1. In the end, the regular expression forgets to add. *. This will not replace such a long row.
2. After the SED command is crawled, some items that do not meet the requirements are displayed. You can use grep "http: // "remove these useless rows (I 've been delaying this step for a long time)
3 \ {\} is written as \ {} \, which is a clerical error, but a long regular expression is written. Such errors cannot be forgiven, therefore, you must complete the parentheses to be transferred and then write the Matching content to avoid errors.
4 sed command forgot to write one of ''. You should write'' after implementation to avoid forgetting and making mistakes.
My second function is as follows:
function URL { sed '1,$s@.*\(http://[a-zA-Z]\{1,\}\.[a-zA-Z]\{1,\}\.[a-zA-Z]\{1,\}\).*@\1@g' /tmp/access_log | grep "^http://" > /tmp/tt.1 sed '1,$s@.*\(http://[0-9]\{1,\}\.[0-9]\{1,\}\.[0-9]\{1,\}\.[0-9]\{1,\}\).*@\1@g' /tmp/access_log | grep "^http://" >> /tmp/tt.1 echo -e " \033[33mTIMES Doman\033[0m \033[5;32m<---------Here is the doman rank\033[0m" sort /tmp/tt.1 | uniq -c | sort -rn | head -5
}
Obviously, this long regular expression is prone to errors, and we hope to learn from it in the future when writing sed commands, regular expressions, and other commands.
Sort and uniq
These two commands have some problems that are easy to ignore. For example, when the uniq command uses sourt-N, It is not compared by the number size, but by the first character size! (Tens of millions of attention) Therefore, sort-Rn should be used to sort numbers. When uniq processes special data, if you do not need to use sort to process the data in advance, the result is not what you want, as shown below:
[Root @ Dean 725-27] # sed '1, $ s @. * \ (http: // [A-Za-Z] \ {1 ,\}\. [A-Za-Z] \ {1 ,\}\. [A-Za-Z] \ {1 ,\}\). * @ \ 1 @ G'/tmp/access_log | grep "^ http: //" | uniq-C 2 http://www.baidu.com // Baidu appears again below! 11983 http://www.linux.com 1 http:// I .ifeng.com 3761 http://www.linux.com 4 http://www.baidu.com // repeat appear!
This is because uniq's processing mechanism does not merge all duplicates, but repeats continuously! Therefore, the correct method is to sort the files to be processed by using sort, sort them together repeatedly, and then use uniq for processing.
sort file | uniq -c | sort -rn
Sort
The code for the entire script is as follows:
#!/bin/bashcd /tmpwget ftp://192.168.0.254/pub/Files/access_logecho -e "\033[32mdownload secessfull!\033[0m "echo "---------------------------------------"FILE=/tmp/access_logfunction IP { echo -e " \033[33mTIMES IP\033[0m \033[5;32m<------------ Here is the ip rank\033[0m" awk '{print $1}' $FILE | sort | uniq -c | sort -rn | head -5}function URL { sed '1,$s@.*\(http://[a-zA-Z]\{1,\}\.[a-zA-Z]\{1,\}\.[a-zA-Z]\{1,\}\).*@\1@g' /tmp/access_log | grep "^http://" > /tmp/tt.1 sed '1,$s@.*\(http://[0-9]\{1,\}\.[0-9]\{1,\}\.[0-9]\{1,\}\.[0-9]\{1,\}\).*@\1@g' /tmp/access_log | grep "^http://" >> /tmp/tt.1 echo -e " \033[33mTIMES Doman\033[0m \033[5;32m<---------Here is the doman rank\033[0m" sort /tmp/tt.1 | uniq -c | sort -rn | head -5}IPURLrm -f /tmp/tt.1
# Conclusion: When writing a shell script, you must first consider the usage of the command. When specifying the format and usage of the command, first write the error-prone areas to avoid writing errors. For other considerations, for example, the final fi of the IF statement, the then on the right side of the IF statement, the done after the loop body, and finally the deleted cache file, when no case statement ends; the last *) instead of '*', and the last esac.