Enterprise shell script analysis and apache Log cutting practices

Source: Internet
Author: User
Tags month name apache log

Analysis of apache logs using Enterprise shell scripts


1. Analyze apache logs

1. There is a file shell. sh with the following content:
[Root @ test3root] # catshell. sh
Http://www.baidu.com/index.html
Http://www.google.com/index.html
Http://www.baidu.com/get.html
Http://www.baidu.com/set.html
Http://www.google.com/index.html
Http://www.yahoo.com.cn/put.html
We need to extract the domain name from the file, count the number of repeated domain names, and sort them in descending order by the number of times. The statistical results are as follows:
3www.baidu.com
2www.google.com
1www.yahoo.com.

Sort compares each row of a file as a unit. The comparison principle is to compare the lines from the first character to the back, compare them by ASCII code values, and output them in ascending order, uniq removes the same row next to each other and retains only one row.

[root@test3 ~]# awk -F "/"'{print $3}' shell.sh |sort |uniq  -c3 www.baidu.com2 www.google.com1 www.yahoo.com.cn

2. Find the 10 most frequently accessed IP addresses in apachelog.

/usr/local/apache2/logs/access_logThe format is as follows:

192.168.46.1-chen [21/Sep/2013: 14: 04: 48 + 0800] "GET/phpmyadmin/themes/pmahomme/img/tab_hover_bg.pngHTTP/1.1" 200502

[root@test3 ~]# awk '{print $1}' /usr/local/apache2/logs/access_log |sort|uniq -c|head -n 107 127.0.0.1228 192.168.46.1

3. Find the most frequently accessed minutes in the apache Log

/usr/local/apache2/logs/access_logThe format is as follows:

192.168.46.1-chen [21/Sep/2013: 14: 04: 48 + 0800] "GET/phpmyadmin/themes/pmahomme/img/tab_hover_bg.pngHTTP/1.1" 200502

[root@test3 ~]# awk '{print $4}' /usr/local/apache2/logs/access_log|cut -c 14-18 |sort|uniq -c|sort -nr|head33 13:5530 13:3519 13:2215 13:5415 13:4515 13:3815 13:3613 13:0410 12:599 13:18

4.Find the most visited page in the apache Log

/usr/local/apache2/logs/access_logThe format is as follows:

192.168.46.1-chen [21/Sep/2013: 14: 04: 48 + 0800] "GET/phpmyadmin/themes/pmahomme/img/tab_hover_bg.pngHTTP/1.1" 200502

[root@test3 ~]# awk '{print $7}' /usr/local/apache2/logs/access_log |sort|uniq -c|sort -nr|head46 /44 /phpmyadmin/10 /phpmyadmin/js/jquery/jquery-1.6.2.js?ts=13593768479 /phpmyadmin/js/update-location.js?ts=13593768479 /phpmyadmin/js/jquery/jquery-ui-1.8.16.custom.js?ts=13593768479 /phpmyadmin/js/jquery/jquery.qtip-1.0.0-rc3.js?ts=13593768479 /phpmyadmin/js/functions.js?ts=13593768478 /phpmyadmin/js/cross_framing_protection.js?ts=13593768477 /phpmyadmin/themes/pmahomme/jquery/jquery-ui-1.8.16.custom.css7 /phpmyadmin/themes/pmahomme/img/sprites.png

5. In the apache Log, find out the time periods with the largest number of accesses and the most load) in minutes. Then, let's see which IP addresses have the most access at these times?

/usr/local/apache2/logs/access_logThe format is as follows:

192.168.46.1-chen [21/Sep/2013: 14: 04: 48 + 0800] "GET/phpmyadmin/themes/pmahomme/img/tab_hover_bg.pngHTTP/1.1" 200502

The following figure shows the access volume during the time period [root @ test3 ~] # Awk '{print $4}'/usr/local/apache2/logs/access_log | cut-c 9-18 | uniq-c | sort-nr | head33 13::5530 201:3519 3::2215 201:545415 2013: 13: 3815 2013: 13: 3610 2013: 12: 599 2013: 13: 189

6. apache-related system operations

1. View apache process: ps aux | grep httpd | grep-v grep | wc-l2. View tcp connection of port 80: netstat-tan | grep "ESTABLISHED" | grep ": 80 "| wc-l3, check the number of ip connections on the current day through the log, and filter duplicates: cat access_log | grep "19/May/2011" | awk '{print $2}' | sort | uniq-c | sort-nr4, what are ip addresses with the highest number of ip connections doing on that day (originally spider ): cat access_log | grep "19/May/1::00" | grep "61.135.166.230" | awk '{print $8}' | sort | uniq-c | sort-nr | head-n 105, visit the top 10 URLs of the page on the current day: cat acc Ess_log | grep "19/May/2010:00" | awk '{print $8}' | sort | uniq-c | sort-nr | head-n 106, use tcpdump to sniff access to port 80 to see who has the highest tcpdump-I eth0-tnn dst port 80-c 1000 | awk-F ". "'{print $1 ". "$2 ". "$3 ". "$4} '| sort | uniq-c | sort-nr: cat access_log | grep 220.181.38.183 | awk '{print $1 usd/t "$8}' | sort | uniq-c | sort-nr | less7, view the number of ip connections in a certain period of time: grep "2006:0 [7-8]" www20110519.log | awk '{print $2} '| Sort | uniq-c | sort-nr | wc-l8, the 20 most connected IP addresses on the WEB server: netstat-ntu | awk '{print $5}' | sort | uniq-c | sort-n-r | head-n 209, view the first 10 IPcat access_logs with the most visits in the log | cut-d ''-f 1 | sort | uniq-c | sort-nr | awk '{print $0 }' | head-n 10 | less10, view IPcat access_log that appears more than 100 times in the log | cut-d ''-f 1 | sort | uniq-c | awk '{if ($1> 100) print $0} '| sort-nr | less11, view the most recent file cat access_log | tail- 10000 | awk '{print $7}' | sort | uniq-c | sort-nr | less12, view the cat access_log | cut-d ''-f 7 | sort | uniq-c | awk '{if ($1> 100) print $0} '| less13, which lists the files that have been transferred for more than 30 seconds cat access_log | awk' ($ NF> 30) {print $7} '| sort-n | uniq-c | sort-nr | head-2014, listing the most time-consuming pages (more than 60 seconds) and the number of occurrences of the corresponding page cat access_log | awk '($ NF> 60 & $7 ~ //. Php/) {print $7} '| sort-n | uniq-c | sort-nr | header-100

Ii. Log Cutting
Install cronolog
In CentOS6.0, Apache is compiled and installed without cutting by default logs. You need to use the Cronnolog tool to cut logs.
1. Download and install
Wgethttp: // cronolog.org/download/cronolog-1.6.2.tar.gz
Tarzxvfcronolog-1.6.2.tar.gz
Cdcronolog-1.6.2
./Configure
Make & makeinstall
2. Run the which command to view the path to verify installation.
Whichc1_log
Default path:/usr/local/sbin/cronolog
3. Configuration
Vi/usr/local/apache/conf/httpd. conf
CustomLog "|/usr/local/sbin/cronolog/usr/local/apache/logs/access _ % Y % m % d. log" combined defines access logs
ErrorLog "|/usr/local/sbin/cronolog/home/www/ex/log/error _ % Y % m % d. log" defines error logs
After saving the configuration file, reload or restart the apache service to take effect.
Servicehttpdrestart
There is another method to use rotatelogs:
Linux system configuration method:
Change it
ErrorLog "|/usr/local/apache/bin/rotatelogs/usr/local/apache/logs/% Y _ % m _ % d_error_log86400480"
CustomLog "|/usr/local/apache/bin/rotatelogs/usr/local/apache/logs/% Y _ % m _ % d_access_log86400480" common
Configuration method in Windows:
# ErrorLog "| bin/rotatelogs.exe logs/error-% y % m % d. log86400480"
# CustomLog "| bin/rotatelogs.exe logs/access-% y % m % d. log86400480" common
Apache Log Cutting
The main configuration file of Apache. The changes are as follows:
Comment out the following two lines:
ErrorLoglogs/error_log
Customlogs/access_logcommon
Then add the following two lines
ErrorLog "|/usr/local/apache/bin/rotatelogs/usr/local/apache/logs/errorlog. % Y-% m-% d-% H _ % M _ % S2M + 480"
CustomLog "|/usr/local/apache/bin/rotatelogs/usr/local/apache/logs/accesslog. % Y-% m-% d-% H _ % M _ % S2M + 480 "common
Meaning:
Errorlog. % Y-% m-% d-% H _ % M _ % S is the log generation format, similar to errorlog.2010-04-15-11_32_30, in the unit of year, month, day, hour, minute, second,
2 M indicates the log size, that is, the size of the log generated into a new log file. The supported units are K, M, G, and the local value is 2 M.
+ 480 is the time difference. The file time is in the United States. The time difference in China is 8 hours or 480 minutes longer than that in the United States. Therefore, it takes 480 minutes.
Other settings are as follows:
Generate an error log file every day
ErrorLog "| bin/rotatelogs.exe-llogs/error-% Y-% m-% d. log86400"
86400 indicates the rotation time in seconds.
Reference: http://hi.baidu.com/jiaofu1127/blog/item/15fed5fa19895b47342acc4a.html
Reference: http://man.chinaunix.net/newsoft/ApacheMenual_CN_2.2new/programs/rotatelogs.html
Rotatelogs-pipeline logging program for Rolling Apache logs
Rotatelogs is a simple program used with the Apache pipeline log function. Example:
CustomLog "| bin/rotatelogs/var/logs/logfile86400" common
This configuration creates a file "/var/logs/logfile. nnnn ", where nnnn is the system time at the start of the nominal log. This time is always a multiple of the rolling time and can be used for synchronization of cron scripts ). In this example, the rolling time is 24 hours later), a new log is generated.
CustomLog "| bin/rotatelogs/var/logs/logfile5M" common
This configuration will scroll the log when the log file size increases to 5 MB.
ErrorLog "| bin/rotatelogs/var/logs/errorlog. % Y-% m-% d-% H _ % M _ % S5M"
This configuration will scroll the log as the error log size increases to 5 megabytes, and the log file name suffix will be created in the following format: errorlog. YYYY-mm-dd-HH_MM_SS.
Syntax
Rotatelogs [-l] logfile [rotationtime [offset] | [filesizeM]
Option
-L
Use the local time instead of the GMT time as the time reference. Note: Using-l in an environment that changes the GMT offset, for example, when the GMT offset is changed, may lead to unexpected results.
Logfile
The reference name is the log file name. If logfile contains "%", it will be considered as a string in strftime format; otherwise, it will be automatically added with the suffix ". nnnnnnnnnnnn" in seconds. Both formats indicate the start time of the new log.
Rotationtime
The interval of Log File scrolling in seconds.
Offset
The number of minutes relative to the UTC time difference. If it is omitted, it is assumed that it is "0" and UTC time is used. For example, to specify the local time of the region with the UTC time difference of-5 hours, this parameter should be "-300 ".
FilesizeM
Specifies that the file is rolled by filesizeM, rather than by time or time difference.
Portability
The following log file format strings can be all strftime) implementation is supported, see the strftime manual for various extension libraries.


%

Full name of the Week (local)

%

The name of the week with 3 characters (local)

% B

Full name of the month name (local)

% B

3-character month name (local)

% C

Date and time (local)

% D

2-digit number of days in a month

% H

2-digit hours (in 24-hour format)

% I

2-digit hours (12-hour)

% J

3-digit number of days in a year

% M

2-digit minutes

% M

2-digit number of months

% P

Am/pm12 hours later (local)

% S

2-digit seconds

% U

2-digit week in a year (Sunday is the first day of a week)

% W

The number of weeks in a two-digit year (Monday is the first day of a week)

% W

One-digit day of the week (Sunday is the first day of the week)

% X

Time (local)

% X

Date (local)

% Y

4-digit year

% Y

2-digit year

% Z

Time zone name

%

Symbol "%" itself

This article from "Good to live" blog, please be sure to keep this source http://wolfword.blog.51cto.com/4892126/1299831

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.