Learn to use shell analysis log only one morning!!!
Many places to share the log analysis of the shell script, but basically did not say the specific meaning of each command, learning cost is very high, summed up here, to facilitate the quick start.
1, users under Windows to use the shell command, please install Cygwin, installation method Google (search technical questions please use Google, Baidu Search does not deserve)
2, the following rough introduction of SEO log analysis commonly used command character usage, you need to know more about each command character, please use Google.
Less filename View file contents Press "Q" to exit
Cat file name opens file, you can open several files multiple times | Cat 1.log 2.log |cat *.cat
grep-Parameter file name
-I is case insensitive
-V Displays all rows that do not meet the criteria
-C displays all rows that meet the criteria (number of conditions)
Egrep belongs to the upgraded version of grep, in the regular this piece of support more complete, the use of regular time to recommend the use of Egrep
Head-2 file name displays 2 rows
head-100 File name | Tail-10 >>a.log Extract file 第91-100 row data
WC-Parameter file name statistics text size, character number, number of lines
-C Statistic Text byte number
-m statistic text character count
-L Statistical text how many lines
sort– the file by the parameter file name
-N Sort files by number
-R Reverse Sort
Uniq-parameter file name to the file to go heavy, before you need to use the sort
Sort
-C Displays the number of times the data repeats
Split-parameter filename to cut a file
-100 (cut into one file per 100 lines)
-C 25m/b/k (split into one file per 25 MB/byte/k)
| Pipeline, transfer the result of the previous command to the next command
">" in the ">" and ">>" redirect write file is equivalent to "W" emptied and written ">>" equivalent to "a" appended to the file
Awk-f ' Split ' pattern {action} file name uses the specified character to segment each row of data, default is a space (site log is a space separate)
-F followed by a separator
Pattern is the condition of action execution, where regular expressions can be used
$n the first few paragraphs of data to represent the entire row of data
NF indicates the number of fields in the current record
$NF represents the Last field
Begin and end, both of which can be used in pattern, provide the function of beginning and ending to give the program an initial state and perform some cleanup work after the program finishes
Bash shell.sh run shell.sh script
Dos2unix xxoo.sh converts "\ r \ n" to "\ n" Windows-->linux (because of the different line breaks under Windows and Linux, so our code under Windows requires the use of Dos2unix Convert to a line break under Linux, or run a shell script will complain.
Unix2dos xxoo.sh converts "\ n" to "\ r \ n" linux-->windows
RM xx.txt Delete xx.txt files
3, some simple command to introduce here, need to understand the shell, suggest you see the relevant books.
Now let's start using the shell analysis log
1, cutting Baidu's crawl data (the file cut out to the specific Crawler data processing can improve efficiency)
Copy Code code as follows:
Cat Log.log |grep-i ' Baiduspider ' >baidu.log
2, the number of Site status code query
Copy Code code as follows:
awk ' {print $} ' Baidu.log|sort|uniq-c|sort-nr
3, Baidu total Crawl Quantity
Copy Code code as follows:
4, Baidu does not repeat the crawl quantity
Copy Code code as follows:
awk ' {print $} ' baidu.log|sort|uniq|wc-l
5, Baidu average data size per crawl (the result is KB)
Copy Code code as follows:
awk ' {print $} ' Baidu.log|awk ' begin{a=0}{a+=$1}end{print a/nr/1024} '
6, Home Grab quantity
Copy Code code as follows:
awk ' $7~/\.com\/$/' baidu.log|wc-l
7. Fetching quantity of a certain catalogue
Copy Code code as follows:
grep '/news/' baidu.log|wc-l
8, grab the most 10 pages
Copy Code code as follows:
awk ' {print $} ' baidu.log|sort|uniq-c|sort-nr|head-10
9, find the crawl of the 404 error page
Copy Code code as follows:
awk ' $9~/^404$/{print $} ' Baidu.log|sort|uniq|sort-nr
10, find out how many JS files crawled and the number of files crawled
Copy Code code as follows:
awk ' $7~/.js$/{print $} ' baidu.log|sort|uniq-c |sort-nr