Shell log Analysis common commands and examples _linux shell

Source: Internet
Author: User
Tags egrep

Learn to use shell analysis log only one morning!!!

Many places to share the log analysis of the shell script, but basically did not say the specific meaning of each command, learning cost is very high, summed up here, to facilitate the quick start.

1, users under Windows to use the shell command, please install Cygwin, installation method Google (search technical questions please use Google, Baidu Search does not deserve)

2, the following rough introduction of SEO log analysis commonly used command character usage, you need to know more about each command character, please use Google.

Less filename View file contents Press "Q" to exit

Cat file name opens file, you can open several files multiple times | Cat 1.log 2.log |cat *.cat
grep-Parameter file name
-I is case insensitive
-V Displays all rows that do not meet the criteria
-C displays all rows that meet the criteria (number of conditions)

Egrep belongs to the upgraded version of grep, in the regular this piece of support more complete, the use of regular time to recommend the use of Egrep

Head-2 file name displays 2 rows
head-100 File name | Tail-10 >>a.log Extract file 第91-100 row data

WC-Parameter file name statistics text size, character number, number of lines
-C Statistic Text byte number
-m statistic text character count
-L Statistical text how many lines

sort– the file by the parameter file name
-N Sort files by number
-R Reverse Sort

Uniq-parameter file name to the file to go heavy, before you need to use the sort

Sort
-C Displays the number of times the data repeats

Split-parameter filename to cut a file
-100 (cut into one file per 100 lines)
-C 25m/b/k (split into one file per 25 MB/byte/k)

| Pipeline, transfer the result of the previous command to the next command

">" in the ">" and ">>" redirect write file is equivalent to "W" emptied and written ">>" equivalent to "a" appended to the file

Awk-f ' Split ' pattern {action} file name uses the specified character to segment each row of data, default is a space (site log is a space separate)
-F followed by a separator
Pattern is the condition of action execution, where regular expressions can be used
$n the first few paragraphs of data to represent the entire row of data
NF indicates the number of fields in the current record
$NF represents the Last field
Begin and end, both of which can be used in pattern, provide the function of beginning and ending to give the program an initial state and perform some cleanup work after the program finishes

Bash shell.sh run shell.sh script

Dos2unix xxoo.sh converts "\ r \ n" to "\ n" Windows-->linux (because of the different line breaks under Windows and Linux, so our code under Windows requires the use of Dos2unix Convert to a line break under Linux, or run a shell script will complain.

Unix2dos xxoo.sh converts "\ n" to "\ r \ n" linux-->windows
RM xx.txt Delete xx.txt files

3, some simple command to introduce here, need to understand the shell, suggest you see the relevant books.

Now let's start using the shell analysis log

1, cutting Baidu's crawl data (the file cut out to the specific Crawler data processing can improve efficiency)

Copy Code code as follows:

Cat Log.log |grep-i ' Baiduspider ' >baidu.log

2, the number of Site status code query
Copy Code code as follows:

awk ' {print $} ' Baidu.log|sort|uniq-c|sort-nr

3, Baidu total Crawl Quantity
Copy Code code as follows:

Wc-l Baidu.log

4, Baidu does not repeat the crawl quantity
Copy Code code as follows:

awk ' {print $} ' baidu.log|sort|uniq|wc-l

5, Baidu average data size per crawl (the result is KB)
Copy Code code as follows:

awk ' {print $} ' Baidu.log|awk ' begin{a=0}{a+=$1}end{print a/nr/1024} '

6, Home Grab quantity
Copy Code code as follows:

awk ' $7~/\.com\/$/' baidu.log|wc-l

7. Fetching quantity of a certain catalogue
Copy Code code as follows:

grep '/news/' baidu.log|wc-l

8, grab the most 10 pages
Copy Code code as follows:

awk ' {print $} ' baidu.log|sort|uniq-c|sort-nr|head-10

9, find the crawl of the 404 error page
Copy Code code as follows:

awk ' $9~/^404$/{print $} ' Baidu.log|sort|uniq|sort-nr

10, find out how many JS files crawled and the number of files crawled
Copy Code code as follows:

awk ' $7~/.js$/{print $} ' baidu.log|sort|uniq-c |sort-nr

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.