Nginx automatic access log Cutting

Source: Internet
Author: User

 

The Web access log (access_log) records the access behaviors of all external clients to the Web server, including the Client IP address, access date, accessed URL resources, HTTP status codes returned by the server, and other important information.
A typical Web access log is as follows:


112.97.37.90--[14/Sep/2013: 14: 37: 39 + 0800] "GET/HTTP/1.1" 301 5 "-" "Mozilla/5.0 (Linux; U; android 2.3.6; zh-cn; Lenovo A326Build/GRK39F) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 MobileSafari/533.1 MicroMessenger/4.5.1.259 "-

 

Planning:

1. Solve the problem:

When the website traffic is high, there will be a lot of log data. If all the logs are written into one log file, the file will become larger and larger. A large file speed slows down, for example, a file may be several hundred megabytes. The operation speed is affected when logs are written. In addition, if I want to see the access log, a file of several hundred megabytes can be downloaded and opened slowly. You can use the free third-party log analysis tool logstore to upload nginx, apache, and iis log files to help you analyze website security. After all, they are specialized and more professional. Logstore also limits the size of uploaded files to a maximum of 50 MB.

 

2. There is no automatic Separation Mechanism for file storage logs in the network. Because nginx does not automatically save the file. Therefore, you need to write your own scripts to implement them.

 

The contents of the shell script file nginx_log_division.sh are as follows:

 

#/Bin/bash

Logs_path = "/data/wwwlogs /"

# Previous log files.

Log_name = "xxx. log"

Pid_path = "/usr/local/nginx/logs/nginx. pid"

 

Mv $ {logs_path }$ {log_name }$ {logs_path }$ {log_name }_$ (date -- date = "last week" + "% Y-% m-d "). log

 

Kill-USR1 'cat $ {pid_path }'

 

The principle of the above shell script is: first, rename the previous log file to one, with the aim of backing up.

The name is based on the date of last Monday. When the script is run, the time is 2013-09-16. The generated file name is xxx. log _ 20130909. log ".

Before kill-USR1 'cat $ {pid_path} 'is executedChanged the file name., Nginx will still write log data to the new named file "xxx. log _ 20130909" as usual. The reason is: in linux, the kernel finds files based on file descriptors.

 

---------------- Understanding of linux file descriptors

 

The file descriptor is an integer identifier named after each open file in the Linux kernel.

The Linux kernel generates (or maintains)"File descriptor table", This file descriptor table records" files opened by this process (identified )".

In this environment, nginx is a running process, which has opened a log file and recorded the file in the file descriptor table.

Even if the path of the log file changes, it can still be found (which can be located according to the file descriptor table ).

----------------------------------------------

When the command "kill-USR1 'cat $ {pid_path} '" is executed, nginx. the pid file actually stores a number (you can open it and check it here. Here it is 894). nginx writes the pid (process number) of the main process to nginx. pid file, so you can get the main process number through the cat command, and directly operate on the specified process number.

 

Kill-USR1 'cat $ {pid_path} 'is equivalent

Kill-USR1 894 # specify the number of the sending signal (USR1) to the process.

 

In linux, linux communicates with running processes through signals. In linux, many predefined signals, such as SIGHUP. USR1 is a user-defined signal. It can be understood that the process itself defines what it should do to receive this signal (that is, the process writer determines whether to receive this signal or do nothing, it is completely decided by the developers ). In nginx, It compiled its own code to process the log file. When I receive the USR1 signal, let nginx re-open the log file. The principle is as follows:

1. When the main nginx process receives the USR1 signal, it will re-open the log file (named by the log name in the nginx configuration file, which is the value set by the access_log item in the configuration file. If the file does not exist, A new file xxx is automatically created. log ).

 

2. Change the log file owner to "worker process )", the purpose is to allow the worker process to have the read and write permissions on log files (the master and worker are generally run by different users, so the owner needs to be changed ).

 

3. The nginx main process will close the log file with the same name (that is, the file that was renamed to xxx. log_20130909.log by using the mv command ),And notifies the working process to use the newly opened log file.(Xxx. log File opened by the main process just now ). The specific implementation is more detailed. The main process sends the USR1 signal to the worker. After the worker receives the signal, it re-opens the log file (that is, the xxx. log as agreed in the configuration file)

 

 

===================================================== Scheduled execution of scripts

 

Set the preceding shell script file to be added to the scheduled task. Crontab is a scheduled task process in linux. When the process is started, it will check whether there are tasks to be executed in its own list at a certain time.

 

Crontab-e

 

* 04 ** 1/data/wwwlogs/nginx_log_division.sh

 

 

Will open a file and add the above Code

The format is "the shell file path to be executed by the hour, day, month, and day ". It can be understood as "every", every minute, every hour, every month, and so on.

I set it to run the nginx_log_division.sh script at on Monday. The script content is to regenerate a new log file.

 

 

Appendix:SetNginxLog configuration method

 

Log_format site '$ remote_addr-$ remote_user [$ time_local] "$ request "'

'$ Status $ body_bytes_sent "$ http_referer "'

'"$ Http_user_agent" $ http_x_forwarded_for ';

 

Access_log/data/wwwlogs/xxxx.com. log site

# The second parameter indicates the log format used and identifies a name for each log format. The site corresponds to the name in log_format.

 

 

The above describes how to use the crontab scheduled task manager.

 

 

There are still some things that are not fully understood and wrong. Update later.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.