Topic: How to parse Web access logs using PythonContent
Python Basics
string, dictionary, file, time
Web Access Logs
Actual combat
Questions
Main lecturer: KKMulti-language mashup engineer, love open source technology, like get new skills, 5 years of PHP, Python project dev
behavior sample logs:
202.189.63.115--[31/aug/2008:15:42:31 +0800] "get/http/1.1" 1365 "
" "mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) gecko/20100101 firefox/15.0.1 "
set up scroll logs
Because the WEB server can be a huge number of visits a day, we need to write the access log separately in different log files, which prevents a single file from being too
In linux, we can use crontab to regularly move access. log to the backup directory. At the same time, we can signal USR1 to the nginx main process to generate a new log file.
Before writing a script, perform the following assumptions:
The log file is/usr/local/nginx/logs/access. log.
The nginx master process id is saved in the file:/usr/local/nginx/
Wdcp configures and processes site access logsWdcp starts from Version 2.2 and supports web Log cutting, saving by day, compressing and packaging, and setting retention and daysSpecific settings are as follows:System settings are shown in figureIn this way, you can automatically cut, compress, and retain the last 7 days.NoteHere is the general switch settings. Specifically, the log function must be enabled on the site to be effective. If site
Configure Apache access logs and log cuts1. Open the Apache master configuration file, command:vim/usr/local/apache2/conf/httpd.conf, locate the Log_config_module module, You can see two logformat(log format) as shown in:650) this.width=650; "Src=" https://s3.51cto.com/wyfs02/M00/8E/5F/wKioL1i-xNOTigYyAAAdvEazfcE062.png-wh_500x0-wm_ 3-wmp_4-s_1863003497.png "title=" 11.PNG "alt=" Wkiol1i-xnotigyyaaadveazfce
AWStats analyzes Nginx access logs
AWStats is a fast-growing Perl-based WEB log analysis tool on Sourceforge.
It can collect the following information about your site:
Visits (UV), visits, page views (PV), clicks, data traffic, etc.
Accurate to monthly, daily, and hourly data
Visitor country
Visitor IP
Robots/Spiders statistics
Guest duration
Statistics on different Files types
Pages-URL statistics
I have encountered a requirement in my recent work to analyze nginx access logs and find it appropriate to use python. Therefore, the following article mainly introduces how to analyze nginx access logs using python regular expressions, for more information, see the following.
Preface
The script in this article analyz
execute 301 for a permanent jump, 302 for a temporary jump, We usually use 301. After changing the configuration file we need to detect the syntax and reloadThen we also need to see if the LoadModule rewrite_module modules/mod_rewrite.so is loaded in the Apache configuration file, and if it is commented, then we need to reload it.[Email protected] ~]#/usr/local/apache2.4/bin/apachectl-m |grep rewriteRewrite_module (Shared) (loaded successfully)If modified, we also have to do syntax detection a
Solve the problem that Nginx logs cannot obtain the remote access ip Address
The company has an application where the backend Web uses Nginx. All Nginx requests are forwarded by the front-end proxy. All the variables used to obtain the remote ip in the log format use
$ Http_x_forwarded_for was originally used well, but one day the log analysis script showed that the IP addresses of many requests were empty,
Configure server. XML in log (Tomcat container)
The following is the excerpt text
Original link: http://forum.ospod.com/post-25088-1.fhtml;jsessionid=3361F472A5E12B9B9BEA1632EC50603A
Access log valve is used to create log files in the same format as standard Web server log files. You can use log analysis tools to analyze logs and track page clicks and user session activities. Many configurations and
The package that parses the logCompile yourself:packageApacheLogParser.jarIt's good to have simple analysis of grep for access logs, but more complex queries require spark.Code:Import Com.alvinalexander.accesslogparser._val p =NewAccesslogparservalLog= Sc.textfile ("Log.small")//log.count//analyze the Apache log in 404 how manyDef getstatuscode ( Line: Option[accesslogrecord]) = { LineMatch { CaseSome (L) =
Use Python to analyze Nginx access logs, split the logs according to the Nginx log format, and store them to the MySQL database.I. Nginx access log format:Copy codeThe Code is as follows:$ Remote_addr-$ remote_user [$ time_local] "$ request" $ status $ body_bytes_sent "$ http_referer" "$ http_user_agent" "$ http_x_forw
Requirements: The Nginx access log is checked in real-time, and if malicious access is added to the Iptables list for deny settings. The format of the access log is the default formatThe keyword in the zz_r variable of the regular expression increases or decreases itself. Currently in use ... .1 ImportOs,sys2 Importsubprocess3 ImportRe4 5 6 #access_log= '/usr/loc
Recovering hacker attacks from a small number of access logs
In the martial arts world, we often mention "bodies can talk", while in the world of cyber attack and defense, logs are the most important means of tracking. Today, we will talk about how to restore the entire hacker attack process and common attack methods through just a few lines of
/awstats.www.sn.com. conf
You shoshould have a look inside to check and change manually main parameters.
You can then manually update your statistics for 'www .sn.com 'with command:
> Perl awstats. pl-update-config = www.sn.com
You can also read your statistics for 'www .sn.com 'with URL:
> Http: // localhost/awstats. pl? Config = www.sn.com
Press ENTER to finish...
2. Modify the awstats. www. benet. conf configuration file.
[Root @ test conf] # vi/etc/awstats/awstats.www.sn.com. conf
LogFile
(error and correct log) of the automatic task to Cutnginxlog.log"Command >> 2>1" means that the correct output and error output are saved to the same file in an append way
4. Reference Links:
Nginx Automatic Cutting access log
Storing Nginx logs by date as file name
'). addclass (' pre-numbering '). Hide (); $ (this). addclass (' has-numbering '). Parent (). append ($numbering); for (i = 1; i '). Text
1, the number of IP direct output display:cat access_log_2011_06_26.log |awk ‘{print $1}‘|uniq -c|wc -l2, the number of IP output to the text display:cat access_log_2011_06_26.log |awk ‘{print $1}‘|uniq -c|wc -l > ip.txtSummary: If a single access log is larger than 2G, the system load will rise when viewed with this command, so do not view when the server is under high load, preferably in a low load time period. The above is one of the company's adve
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.