How to analyze and view apache logs by date in Linux

Source: Internet
Author: User
Tags apache log file apache log
How to analyze and view apache logs by date in Linux I. how to configure apache logs in the apache configuration file errorlogs/error_logcustomlogs/access_logcommonLinux: change it to ErrorLog | /... how to analyze and view apache logs by date in Linux I. how to configure apache logs in the apache configuration file ErrorLog logs/error_logCustomLog logs/access_log commonLinux: change it to ErrorLog "|/usr/local/apache/bin/rotatelogs/home/logs/www/% Y _ % m _ % d_error_log 86400 480" CustomLog "|/usr/ local/apache/bin/rot Atelogs/home/logs/www/% Y _ % m _ % d_access_log 86400 480 "in commonWindows: # ErrorLog "| bin/rotatelogs.exe logs/vicp_net_error-% y % m % d. log 86400 480 "# CustomLog" | bin/rotatelogs.exe logs/vicp_net_access-% y % m % d. log 86400 480 "common does not know how to set the 480 parameter for the first time, resulting in an 8-hour difference between the log record time and the server time. it turns out that rotatelogs has an offset parameter, which indicates the time difference in minutes relative to UTC, china is the eighth time zone, with a difference of 480 minutes. 86400 indicates a day. Appendix rotatelogs description rotatelogs logfile [rotationtime [offset] | [filesizeM] option logfile it is added with the benchmark name is the log file name. If logfile contains '%', it is considered as a string in strftime (3) format; otherwise, it is automatically added with the. nnnnnnnnnn suffix in seconds. Both formats indicate the start time of the new log. The number of minutes between the offset of the interval in seconds and the UTC time of the log file rollback in rotationtime. If it is omitted, it is assumed that it is 0 and UTC time is used. For example, to specify the local time of the area with the UTC time difference of-5 hours, this parameter should be-300. FilesizeM specifies the size of the file with the suffix M, in MB, when the rollback is performed, instead of specifying the rollback time or time difference. II. set apache log record format the custom log file format involves two commands, namely the LogFormat command and the CustomLog command. by default, the httpd. conf file provides several examples of these two commands. The LogFormat command defines the format and specifies a name for the format. then we can directly reference this name. The CustomLog command sets the log file and specifies the format used by the log file (usually by the format name ). The function of the LogFormat command is to define the log format and specify a name for it. For example, in the default httpd. in the conf file, we can find the following line of code: logFormat "% h % l % u % t \" % r \ "%> s % B" common this command creates a log format named "common, the log format is specified in the content enclosed by double quotation marks. Each variable in the format string represents a specific information, which is written to the log file in the order specified by the format string. The Apache document has provided all the variables that can be used for format strings and their meanings... A: remote IP address %... A: local IP address %... B: Number of sent bytes, excluding HTTP header %... B: Number of sent bytes in CLF format, excluding the HTTP header. For example, if no data is sent, write '-' instead of 0. %... {FOOBAR} e: content of the environment variable FOOBAR %... F: file name %... H: remote host %... Protocol for H request %... {Foobar} I: Content of Foobar, the header row of the request sent to the server. %... L: remote login name (from identd, if provided) %... M request method %... {Foobar} n: content of the annotation "Foobar" from another module %... {Foobar} o: Content of Foobar, response header line %... P: Port % used by the server to respond to the request... P: the ID of the subprocess that responds to the request. %... Q query string (if a query string exists, it contains "?" Otherwise, it is an empty string .) %... R: The first line of the request %... S: status. For internal redirection requests, this refers to the status of * original * requests. If %…> S, that is, later requests. %... T: Time expressed in public log time format (or standard English format) %... {Format} t: Time %… in the specified format... T: the time consumed to respond to the request, measured in seconds. %... U: Remote User (from auth; if the returned status (% s) is 401, it may be forged) %... U: URL path requested by the user %... V: ServerName %… of the server responding to the request... V: the server name set according to UseCanonicalName is in all the variables listed above, "…" Indicates an optional condition. If no condition is specified, the value of the variable is replaced. The default httpd is used before the analysis. the example of the LogFormat command in the conf file shows that it creates a log format named "common", including: remote host, remote login name, remote user, request time, the first line of the request code, the request status, and the number of bytes sent. Sometimes we only want to record some specific and defined information in the log, then we need to use "…". If one or more HTTP status codes are put between "%" and the variable, only when the returned status code belongs to the specified status code, the content represented by the variable is recorded. For example, if you want to record all invalid links of a website, you can use: LogFormat @ 4 {Referer} I BrokenLinks. otherwise, if we want to record requests whose status code is not equal to the specified value, we only need to add a "!" Symbol: LogFormat %! 200U SomethingWrong 3. specially record a spider record SetEnvIfNoCase User-Agent baiduspbaidu_robotlogformat "% h % t \" % r \ "%> s % B" CustomLog "in robotlinux |/usr/ local/apache2.2.0/bin/rotatelogs/usr/local/apache2.2.0/logs/release 86400 480 "robot env = baidu_robotwindows CustomLog" | bin/rotatelogs.exe logs/release 86400 "robot restart log, each record is similar to the following: 61.135.168.14 [22/Oct/2008: 22: 21: 26 + 0800] "GET/HTTP/1.1" 200 8427 4. remove images, js, css, and swf files from logs. SetEnv IMAG 1 CustomLog "| bin/cronolog.exe logs/cpseadmin/access _ % Y % m % d. log" combined env =! IMAG clears error. log, access. log and limit the size of Apache log files. log and error. the log file has not been moved since the server was installed. Today, the MYSQL database connection error of discuz is suddenly displayed, and the error 2003 is displayed. log, access. the log is full, and the number of files reaches 30 GB. Grandma's, immediately search for the files. we need to kill the two boys. The following method is found on the internet. it immediately takes effect! An example of setting in Windows is as follows: Step 1: delete the error in the directory of Apache2/logs. log, access. step 2 of log File: Open httpd of Apache. in the conf configuration file, find the following two configurations: ErrorLog logs/error. logCustomLog logs/access. log common is directly commented out and replaced with the following configuration file. # Restrict the error log file to 1 MErrorLog "| bin/rotatelogs.exe-l logs/error-% Y-% m-% d. log 1 MB "# generate an error log file every day # ErrorLog" | bin/rotatelogs.exe-l logs/error-% Y-% m-% d. log 86400 "# Restrict access log files to 1 MCustomLog" | bin/rotatelogs.exe-l logs/access-% Y-% m-% d. log 1 m "common # generate an access log file every day # CustomLog" | bin/rotatelogs.exe-l logs/access-% Y-% m-% d. log 86400 "common reference: access under Apache. log and error. log file processing methods in the past few days, some members and I said that the website access speed is getting slower and slower. I checked and found that two log files under Apache2 are very large, with a total of over 800 MB. They are access. log and error. log. So I found a method on the Internet to lose weight for access. log and error. log. This method allows the two files to be generated on a daily basis. In this way, you can delete the old file. Httpd in Apache. in the conf configuration file, find the following two sentences: ErrorLog logs/error. logCustomLog logs/access. log common then change the two sentences to the following: CustomLog "| D:/apache2/bin/rotatelogs.exe D:/apache2/logs/access _ % Y _ % m _ % d. log 86400 480 "commonErrorLog" | D:/apache2/bin/rotatelogs.exe D:/apache2/logs/error _ % Y _ % m _ % d. log 86400 480 "everything is so simple, so that these two log files will start a new file every day, so that a single file will not be too large and cannot be opened, and the log information cannot be seen. you can also delete the preceding log file. Access. logs, the pieces will reach dozens or even hundreds of megabytes after running on the WEB server for a period of time. if there is an error in Apache running, error. logs will also increase to dozens of megabytes. we know that the system is very memory-consuming to read and write a large text file. Therefore, it is necessary to limit the log file size. For instructions on log file size configuration, see http://httpd.apache.org/docs/2.0/programs /Rotatelogs.html, you can use the apache program rotatelogs.exe (located in the {$ apache}/bin/directory) to limit the log file size. Usage: rotatelogs [-l] [offset minutes from UTC] orAdd this: TransferLog "| rotatelogs/some/where 86400" orTransferLog "| rotatelogs/some/where 5 m" to httpd. conf. the generated name will be/some/where. nnnn where nnnn is the system time at which the log nominally starts (N. b. if using a rotation time, the time will always be a multiple of the rotation time, so you can synchronizecron scripts with it ). At the end of each rotation time or when the file size is reached a new log is started. in Windows, the setting example is as follows: # restrict the error log file to 1 MErrorLog "| bin/rotatelogs.exe-l logs/error-% Y-% m-% d. log 1 MB "# generate an error log file every day # ErrorLog" | bin/rotatelogs.exe-l logs/error-% Y-% m-% d. log 86400 "# Restrict access log files to 1 MCustomLog" | bin/rotatelogs.exe-l logs/access-% Y-% m-% d. log 1 m "common # generate an access log file every day # CustomLog" | bin/rotatelogs.exe-l logs/acce Ss-% Y-% m-% d. log 86400 "commonlinux/Unix should be similar. Clear apache access. the log method has a client server built with apache. recently, the website is always very slow, the server is very slow, and sometimes even the website cannot be opened. after investigation and analysis, it turns out to be the access in it. log and error. log files should be read and cleared frequently. if you are busy and forget to read and clean them up, the two files will expand very large in a short time, it cannot be opened. The following describes how to clean the access. log and error. log files. I suspect there are other crawlers. I will be crawling several of my websites tomorrow. Optimize access. log and error. the log method is as follows: CustomLog "| D:/thridparty-system/java/apache2/bin/rotatelogs.exe D: /thridparty-system/java/apache2/logs/access _ % Y _ % m _ % d. log 86400 480 "commonErrorLog" | D:/thridparty-system/java/apache2/bin/rotatelogs.exe D: /thridparty-system/java/apache2/logs/error _ % Y _ % m _ % d. log 86400 480 "everything is so simple, so that these two log files will start a new file every day, so that a single file will not be too large and cannot be opened, and the log information cannot be seen. solve Apache log file ACCESS. one way to increase LOG size is to use httpd. in conf, customlog logs/access. change log common to customlog "| c:/apache/bin/rotatelogs c:/apache/logs/% y _ % m _ % d. access. log 86400 480 "common restart apache where c:/apache/is the path where you install apache so that a log file is generated every day to solve the Apache log file ACCESS. one way to view and analyze APACHE logs is to expand logs. assume that the apache LOG format is 118.78.199.98--[09/Jan/2010: 00: 59: 59 + 0800] "GET/Public/Css/index.css HTTP/1.1" 304-" http://www.a.cn /Common/index. php "" Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; GTB6.3) "question 1: Find the 10 most frequently accessed IP addresses in apachelog. Awk '{print $1}' apache_log | sort | uniq-c | sort-nr | head-n 10awk first captures the IP addresses in each log, if the log format has been customized, you can use-F to define the delimiter and print the specified column; sort to sort the same records for the first time; upiq-c to merge duplicate rows, record the number of repetitions. The first 10 rows are filtered by head, and sort-nr is sorted by number in reverse order. The command for my reference is: Display the 10 most common commands sed-e "s/| // n/g "~ /. Bash_history | cut-d ''-f 1 | sort | uniq-c | sort-nr | head Problem 2: find the maximum number of accesses in apache logs for several minutes. Awk '{print $4}' access_log | cut-c 14-18 | sort | uniq-c | sort-nr | the fourth column of headawk separated by space is [09/Jan/2010: 00: 59: 59; cut-c extracts 14 to 18 characters and the remaining content is similar to question 1. Question 3: Find the most visited page in apache logs: awk '{print $11}' apache_log | sed's/^. * cn /(. */)/"// 1/g' | sort | uniq-c | sort-rn | head is similar to question 1 and question 2. the only special difference is to use the sed replacement function" http://www.a.cn /Common/index. php "is replaced with the content in the brackets :" http://www.a.cn (/Common/index. php) "Question 4: in the apache log, find the maximum number of times of access (the most load) (in minutes), and then check which IP addresses have the most access at these times? 1. View apache process: ps aux | grep httpd | grep-v grep | wc-l2. View tcp connection of port 80: netstat-tan | grep "ESTABLISHED" | grep ": 80 "| wc-l3, check the number of ip connections on the current day through the log, and filter duplicates: cat access_log | grep "19/May/2011" | awk '{print $2}' | sort | uniq-c | sort-nr4, what are ip addresses with the highest number of ip connections doing on that day (originally Spider ): cat access_log | grep "19/May/1::00" | grep "61.135.166.230" | awk '{print $8}' | sort | uniq-c | sort-nr | head-n 10 5, the first 10 URLs of the access page on the current day: cat ac Cess_log | grep "19/May/2010:00" | awk '{print $8}' | sort | uniq-c | sort-nr | head-n 106, use tcpdump to sniff access to port 80 to see who has the highest tcpdump-I eth0-tnn dst port 80-c 1000 | awk-F ". "'{print $1 ". "$2 ". "$3 ". "$4} '| sort | uniq-c | sort-nr: cat access_log | grep 220.181.38.183 | awk '{print $1 usd/t "$8}' | sort | uniq-c | sort-nr | less7, view the number of ip connections in a certain period of time: grep "2006:0 [7-8]" www20110519.log | awk '{print $ 2} '| sort | uniq-c | sort-nr | wc-l8, the 20 most connected IP addresses on the WEB server: netstat-ntu | awk '{print $5}' | sort | uniq-c | sort-n-r | head-n 209, view the first 10 IPcat access_logs with the most visits in the log | cut-d ''-f 1 | sort | uniq-c | sort-nr | awk '{print $0 }' | head-n 10 | less 10, view IPcat access_log that appears more than 100 times in the log | cut-d ''-f 1 | sort | uniq-c | awk '{if ($1> 100) print $0} '| sort-nr | less11, view the most recent file cat access_log | tail -10000 | awk '{print $7}' | sort | uniq-c | sort-nr | less 12, view the cat access_log | cut-d ''-f 7 | sort | uniq-c | awk '{if ($1> 100) print $0} '| less13, which lists the files that have been transferred for more than 30 seconds cat access_log | awk' ($ NF> 30) {print $7} '| sort-n | uniq-c | sort-nr | head-2014, listing the most time-consuming pages (more than 60 seconds) and the number of occurrences of the corresponding page cat access_log | awk '($ NF> 60 & $7 ~ //. Php/) {print $7} '| sort-n | uniq-c | sort-nr | header-100
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.