Examples of awk scenarios

Source: Internet
Author: User

Take the following example log as an example:

202.189.63.115--[31/aug/2012:15:42:31 +0800] "get/http/1.1" 1365 "-"  "mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) gecko/20100101 firefox/15.0.1 "
That's the whole record line.
$ $ is Access IP "202.189.63.115"
$4 is the first half of the request time "[31/aug/2012:15:42:31"
$ +0800 is the second half of the request time.

etc...
When we use the default Domain delimiter, we can parse out the following different types of information from the log:

awk ' {print $ ' access.log       # IP address  (%h)  awk ' {print $} ' access.log       # RFC 1413 identification  (%l)  awk ' {prin T $ $} ' access.log       # user ID  (%u)  awk ' {print $4,$5} ' access.log     # Date and Time  (%t)  awk ' {print $7} ' AC Cess _log      #  URI (%>s)  awk ' {print $9} ' Access _log      # state Code (%>S)  awk ' {print $} ' access _lo G     # Response size  (%b)

It is easy to find that using only the default domain delimiter, it is inconvenient to parse other information, such as request lines, reference pages, and browser types, because the information contains an indeterminate number of spaces. Therefore, we need to change the domain delimiter to "to be able to read this information easily."

Awk-f\ "' {print $} ' access.log        # Request Line (%r)  awk-f\" ' {print $4} ' access.log        # Reference page  awk-f\ "' {print $6} ' a Ccess.log        # Browser

Note: In order to avoid unix/linux Shell misunderstanding "as a string start, we used a backslash, escaped."

examples of using awk scenarios

Statistics Browser Type

If we want to know which types of browsers have visited the site and are sorted in reverse order, I can use the following command:

Awk-f\ "' {print $6} ' Access.log | Sort | uniq-c | Sort-fr

This command line resolves the browser domain first, and then uses the pipe to output as the input to the first sort command. The first sort command is primarily designed to facilitate the Uniq command to count the number of times a different browser appears. The last sort command will arrange the previous statistics in reverse order and output.

discover problems with the system

We can use the following command line to count the status codes returned by the server and to discover possible problems with the system.

awk ' {print $9} ' Access.log | Sort | uniq-c | Sort

Normally, the status code 200 or 30x should be the most frequently occurring. 40x typically represents a client access issue. 50x generally indicates a server-side problem.

Here are some common status codes:

    • 200-The request was successful, and the desired response header or data body will be returned with this response.
    • 206-The server has successfully processed a partial GET request
    • 301-The requested resource has been permanently moved to a new location
    • 302-The requested resource is now temporarily responding to requests from different URIs
    • 400-Bad request. The current request could not be understood by the server
    • 401-The request is not authorized and the current request requires user authentication.
    • 403-No access. The server has understood the request, but refuses to execute it.
    • 404-the file does not exist and the resource is not found on the server.
    • 500-The server encountered an unexpected condition that caused it to be unable to complete the processing of the request.
    • 503-The server is currently unable to process requests due to temporary server maintenance or overloading.

HTTP Protocol status Code definitions can be found in: Hypertext Transfer Protocol--http/1.1

An example of the awk command for the status code:

1. Find and display all requests with a status code of 404

awk ' ($9 ~/404/) ' Access.log

2. Count all requests with a status code of 404

awk ' ($9 ~/404/) ' Access.log | awk ' {print $9,$7} ' | Sort

Now we assume that a request (for example: URI:/path/to/notfound) produces a large number of 404 errors, and we can find out which reference page the request came from, and what browser it came from, using the following command.

Awk-f\ "' ($ ~" ^get/path/to/notfound ") {print $4,$6} ' Access.log

Trace who's on the hotlinking site pictures

System administrators sometimes find that other sites use images stored on their sites on their sites for some reason. If you want to know exactly who is not authorized to use the images on your website, we can use the following command:

Awk-f\ "' ($/\. jpg|gif|png)/&& $4!~/^http:\/\/www\.example\.com/) \  {print $4} ' access.log \ | sort | uniq-c | sort

Note: Before using, change www.example.com to the domain name of your website.

    • Use "to decompose each row;
    • The request line must include ". jpg", ". gif", or ". png";
    • The reference page does not start with your site's domain name string (in this case, www.example.com);
    • Displays all referenced pages and counts the number of occurrences.

commands related to accessing IP addresses

Statistics total number of different IP accesses:

awk ' {print '} ' Access.log |sort|uniq|wc–l

Count the number of pages that each IP visited:

awk ' {++s[$1]} END {for (a in S) print A,s[a]} ' log_file

The number of pages accessed per IP is sorted from small to large:

awk ' {++s[$1]} END {for (a in S) print S[a],a} ' log_file | Sort-n

See which pages are accessed by an IP (for example, 202.106.19.100):

grep ^202.106.19.100 Access.log | awk ' {print $1,$7} '

Statistics August 31, 2012 14 o'clock how much IP access is in:

awk ' {print $4,$1} ' Access.log | grep 31/aug/2012:14 | awk ' {print $} ' | Sort | Uniq | Wc-l

Count the top 10 IP addresses for the most visited

awk ' {print '} ' Access.log |sort|uniq-c|sort-nr |head-10

commands that respond to page size

List several files with the largest transfer size

Cat Access.log |awk ' {print $ "" $ "" $4 "" $7} ' |sort-nr|head-100

Lists pages with output greater than 204800 byte (200kb) and the number of corresponding page occurrences

Cat Access.log |awk ' ($ > 200000) {print $7} ' |sort-n|uniq-c|sort-nr|head-100

commands related to page response time

If the last column of the log records the paging file transfer time (%T), for example we can customize the log format to:

Logformat "%h%l%u%t \"%r\ "%>s%b \"%{referer}i\ "\"%{user-agent}i\ "%t" combined

You can use the following command to count all log records that have a response time of more than 3 seconds.

awk ' ($NF > 3) {print $} ' Access.log

Note: NF is the number of fields in the current record. $NF is the last field.

List requests that are longer than 5 seconds

awk ' ($NF > 5) {print $} ' Access.log | Awk-f\ "' {print $} ' |sort-n|  Uniq-c|sort-nr|head-20


Examples of awk scenarios

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.