Python writes a squid to access the log analysis applet, pythonsquid

Source: Internet
Author: User

Python writes a squid to access the log analysis applet, pythonsquid

In the past two weeks, several people in the group want to learn python, so we have created such an environment and atmosphere for everyone to learn.

Yesterday, I posted a requirement in the group to count and sort the number of ip segments and the number of URLs in the squid access log. Many of you have implemented the corresponding functions, I will post my simple implementation. Welcome to make a brick:

The log format is as follows:

Copy codeThe Code is as follows:
% Ts. % 03tu % 6tr % {X-Forwarded-For}> h % Ss/% 03Hs % <st % rm % ru % un % Sh/% <A % mt "% {Referer }> h "" % {User-Agent}> h "% {Cookie}> h

Copy codeThe Code is as follows:
1372776321.285 0 100.64.19.225 TCP_HIT/200 8560 GET http://img1.jb51.net/games/0908/19/1549401_3_80x100.jpg-NONE/-image/jpeg "http://www.bkjia.com/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; QQDownload 734 ;. NET4.0C ;. net clr 2.0.50727) "pcsuv = 0; % 20 pcuvdata = lastAccessTime = 1372776317582; % 20u4ad = 33480hn; % 20c = 14 arynt; % 20 uf = 1372776310453

Copy codeThe Code is as follows:
#! /Usr/bin/python
#-*-Coding: UTF-8 -*-
Import sys
From optparse import OptionParser
'''
It is just a test of log files, where the number of access. log ip addresses is counted
'''
Try:
F = open ('/data/proclog/log/squid/access. log ')
Handle t IOError, e:
Print "can't open the file: % s" % (e)
 
Def log_report (field ):
'''
Return the field of the access log
'''
If field = "ip ":
Return [line. split () [2] for line in f]
If field = "url ":
Return [line. split () [6] for line in f]
Def log_count (field ):
'''
Return a dict of like {field: number}
'''
Fields2 = {}
Fields = log_report (field)
For field_tmp in fields:
If field_tmp in fields2:
Fields2 [field_tmp] + = 1
Else:
Fields2 [field_tmp] = 1
Return fields2
Def log_sort (field, number = 10, reverse = True ):
'''
Print the sorted fields to output
'''
For v in sorted (log_count (field). iteritems (), key = lambda x: x [1], reverse = reverse) [0: int (number)]:
Print v [1], v [0]
If _ name _ = "_ main __":
Parser = OptionParser (usage = "% prog [-I |-u] [-n num |-r]", version = "1.0 ")
Parser. add_option ('-n',' -- number', dest = "number", type = int, default = 10, help = "print top line of the ouput ")
Parser. add_option ('-I', '-- ip', dest = "ip", action = "store_true", help = "print ip information of access log ")
Parser. add_option ('-U',' -- url', dest = "url", action = "store_true", help = "print url information of access log ")
Parser. add_option ('-R',' -- reverse ', action = "store_true", dest = "reverse", help = "reverse output ")
(Options, args) = parser. parse_args ()
 
If len (sys. argv) <2:
Parser. print_help ()
If options. ip and options. url:
Parser. error ('-I and-u can not be execute at the same Time ')
If options. ip:
Log_sort ("ip", options. number, True and options. reverse or False)
If options. url:
Log_sort ("url", options. number, True and options. reverse or False)
 
F. close ()

The effect is as follows:


The SQUID agent log in LINUX is too large.

You can disable log writing in the squid configuration file, for example
Cache_access_log/squid/logs/access. log
Change to cache_access_log none.
Squid does not generate access logs.

If you do not disable squid. conf, squid will write a large number of log files. You must periodically scroll log files to prevent them from becoming too large. Squid writes a large amount of important information into the log. If it cannot be written, squid will have an error and exit.
Run the following command:
% Squid-k rotate
To scroll log records.

For example, the following task interfaces scroll logs at every day:

0 4 ***/usr/local/squid/sbin/squid-k rotate

This command does two things. First, it closes the currently opened log file. Then, add a digital extension after the file name, and rename cache. log, store. log, and access. log. For example, cache. log is changed to cache. log.0, and cache. log.0 is changed to cache. log.1.

Crontab is a scheduled process in Linux. It automatically runs programs according to the write time. There is a lot of information on the Internet to check its usage.

For squid, we recommend that you refer to home.arcor.de/pangj/squid/
Squid, an authoritative Chinese guide, has been mentioned in most things.

I want to write a log analysis tool in java to retrieve the content we need in the log according to a certain rule.

Give you a thought
When writing a thread to read logs that comply with the exception rules, output the log records to the page
 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.