Explain the method of using Elk to analyze Nginx server log _nginx

Source: Internet
Author: User
Tags decode all locale redis nginx server logstash

All Elk installation package can go to the official website download, although the speed is slightly slow, but also acceptable, official website address: https://www.elastic.co/

Logstash

In the Logstash1.5.1 version, the pattern directory has changed, stored in the/logstash/vendor/bundle/jruby/1.9/gems/logstash-patterns-core-0.1.10/directory, But fortunately the configuration reference can be configured for the patterns directory, so I created a patterns directory under the root directory of Logstash. and the configuration directory in the 1.5.1 version does not exist, if the RPM package installed, can be configured under/etc/logstash/conf.d/, but the personal test many times, so that the start of frequent failure, has not yet to analyze the reasons (individuals do not recommend the use of RPM package installation). So you can start with nohup or screen.
Pattern configuration for exclusive Nginx:

Copy Code code as follows:

Nginxaccess%{ip:client}%{user:ident}%{user:auth} \[%{httpdate:timestamp}\] \ "(?:%{word:verb}%{NOTSPACE:request} ( ?: http/%{number:http_version})? | -) \ "%{host:domain}%{number:response} (?:%{number:bytes}|-)%{qs:referrer}%{qs:useragent}" (%{ip:x_forwarder_for}| -)"

Because it is the test environment, I use Logstash to read the Nginx log file to get the Nginx log, and only read the Nginx access log, not interested in the error log.

Using the Logstash version of 2.2.0, create a conf folder under the Log Stash program directory to hold the profile of the parsing log and create a file test.conf in it, which reads as follows:

Input {
  file {
    path => ['/var/log/nginx/access.log ']
  }
}
filter {
  Grok {
    match => { c8/> "message" => "%{iporhost:clientip} \[%{httpdate:time}\] \%{word:verb}%{uripathparam:request} HTTP/%{NUMBER: Httpversion}\ "%{number:http_status_code}%{number:bytes} \" (?  
 

What needs to be explained is that the grok part of the filter field, because the Nginx log is formatted, Logstash the idea of parsing the log is to match the log with a regular expression and save the field to the corresponding variable. In Logstash, the Grok plug-in is used to parse the log, and the message part of the Grok is the corresponding Grok syntax, which is not exactly equivalent to the syntax of the present expression, in which the variable information is added.

The specific Grok syntax is not overly described and can be understood through the Logstash official documentation. However, the variable type in the Grok syntax, such as Iporhost, does not find a specific document, only through Grep-nr "Iporhost" under the Logstash installation directory. To search for specific meanings.

The stdout section of the configuration file is used to print information about the Grok parsing results, and must be turned on during the debugging phase.

This is where you can verify that the syntax of the Grok expression is correct, and you can write and test the Grok expression here.

For the Elasticsearch part does not do too much introduction, the on-line easy to find the material.


Elk Collection Analysis Nginx access log

Use Redis push and pop as queues, and then have a logstash_indexer to insert Elasticsearch from the POP data analysis in the queue. The benefits of this are extensible, logstash_agent only need to collect log into the queue, compare the possible bottleneck log analysis using Logstash_indexer to do, and this logstash_indexer can be horizontally extended, I can run multiple indexer on a separate machine for log analysis storage.

Well, now it's further configured.

Log storage format in Nginx

Nginx because of a GET request and a POST request, the parameters of the GET request are displayed directly in the URL of the log, but the arguments for the POST request are not reflected in the access log. Then I want the post parameters to be stored as well. You need to define a log format yourself.

Copy Code code as follows:

Log_format Logstash ' $http _host $server _addr $remote _addr [$time _local] "$request" $request _body $status $body _bytes_ Sent "$http _referer" "$http _user_agent" $request _time $upstream _response_time ';


The requestbody here is the body of the post request, and the parameters of the GET request are stored in the requestbody as the body of the POST request, and then the parameters of the GET request are in the request. How to analyze concretely, we think again in indexer.

The SERVER_ADDR store is the IP of the current web machine, which is used to analyze the log's original source when it is analyzed.

The following is an example of a GET request:

Copy Code code as follows:

api.yejianfeng.com 10.171.xx.xx 100.97.xx.xx [10/jun/2015:10:53:24 +0800] "get/api1.2/qa/getquestionlist/?limit=10 &source=ios&token=12343425324&type=1&uid=304116&ver=1.2.379 HTTP/1.0 "-2950"-"" TheMaster/ 1.2.379 (iPhone; IOS 8.3; scale/2.00) "0.656 0.654


The following is an example of a POST request:

Copy Code code as follows:

api.yejianfeng.com 10.171.xx.xx 100.97.xx.xx [10/jun/2015:10:53:24 +0800] "post/api1.2/user/mechanicupdate/http/1.0 "Start_time=1276099200&lng=110.985723&source=android&uid=328910&lat=35.039471&city=140800 200 754 "-" "-" 0.161 0.159


By the way, the knowledge here defines a log format in nginx.conf, and remember to include log storage in specific services. Like what

Listen    ;
server_name api.yejianfeng.com;
Access_log/mnt/logs/api.yejianfeng.com.logstash.log Logstash;

Configuration of Log_agent

This configuration is to redis the queue into the log on the line. The position of output is set to Redis on the line.

Input {
File {
Type => "Nginx_access"
Path => ["/mnt/logs/api.yejianfeng.com.logstash.log"]
}
}
Output {
Redis {
Host => "10.173.xx.xx"
Port => 8001
Password => Pass
Data_type => "List"
Key => "Logstash:redis"
}
}
Configuration of Log_indexer

Log_indexer configuration is more cumbersome, the need to configure a three-part

Input: Responsible for obtaining log data from Redis
Filter: Responsible for analysis and structuring of log data
Output: Responsible for the structured data storage into the Elasticsearch
Input section

Input {
    Redis {
        host => "10.173.xx.xx"
        port => 8001 password => pass
        data_type "list"
        key => "Logstash:redis"
    }
}

Of course, the Redis configuration should be consistent with the agent.

Filter section
parsing text can be analyzed using Grokgrok Debug, and the log format needs to be compared to each other. This grok syntax is still more complex, but it's okay to have an online grok tool that you can use. For the log format of the previous get and post, the Grok statement that is modified is as follows:

Copy Code code as follows:

%{iporhost:http_host}%{iporhost:server_ip}%{iporhost:client_ip} \[%{httpdate:timestamp}\] \%{WORD:http_verb} (?: %{path:baseurl}\?%{notspace:params} (?: Http/%{number:http_version})? | %{data:raw_http_request}) \ "(%{notspace:params})? | -%{number:http_status_code} (?:%{number:bytes_read}|-)%{qs:referrer}%{qs:agent}%{number:time_duration:float}%{ NUMBER:time_backend_response:float}


Here's a little trick, params, in order for both get and post parameters to be reflected on a parameter, in the case of the corresponding getting and post parameters, the params parameter is designed to correspond.

OK, now the params is the requested parameter. Like source=ios&uid=123. But then, when I finally do the statistics, I want to get "all of the source values for iOS calls," then you need to structure the parameters. And we also hope that if you add a new parameter to the interface, you can use it directly without modifying the Logstash_indexer method, which is to use KV,KV to implement a k=v format for the structure of a string. The parameter prefix can add a prefix for this key in statistics, Include_keys can set which key is included in it, Exclude_keys can set which key to exclude.

kv {
  prefix => "params."
  Field_split => "&"
  source => "params"
}

Well, now there is another problem, if the request has Chinese, then the Chinese in the log is stored after urlencode. We specifically analyze the time, for example, there is an interface is/api/search?keyword= us, we need to count the keyword is the hot order of the query, then need to decode. Logstash. There are also urldecode commands, UrlDecode can be set on a field, or it can be set to decode all the fields.

UrlDecode {
  all_fields => true
}

It seems to be all right, but actually when you run, you will find that the timestamp in the Elasticsearch and the request log are not the same. The reason is that the request log in ES uses the log structure to store the time to enter ES, rather than timestamp time, here want the time in the ES and request in the log time unification how to do? Use the date command. The specific settings are as follows:

Date {
    locale => "en"
    match => ["timestamp", "Dd/mmm/yyyy:hh:mm:ss Z"]
}

The full configuration in the specific logstash_indexer is as follows:

Filter {
    Grok {
      match => [
            ' message ', '%{iporhost:http_host}%{iporhost:server_ip}%{iporhost:client_ IP} \[%{httpdate:timestamp}\] \%{word:http_verb} (?:%{path:baseurl}\?%{notspace:params} (?: Http/%{number:http_ Version})? | %{data:raw_http_request}) \ "(%{notspace:params})? | -%{number:http_status_code} (?:%{number:bytes_read}|-)%{qs:referrer}%{qs:agent}%{number:time_duration:float}%{ NUMBER:time_backend_response:float} "
      ]
    }
    kv {
      prefix =>" params. "
      Field_split => "&"
      source => "params"
    }
    urldecode {
         all_fields => true
    }
    Date {
        locale => "en"
        match => ["timestamp", "Dd/mmm/yyyy:hh:mm:ss Z"]
    }

}

Output section
this is very simple. Send data to ES

Output {
    elasticsearch {
        embedded => false
        protocol => ' HTTP '
        host => ' localhost '
        port = > "9200"
        user => "Yejianfeng"
        password => "Yejianfeng"
    }
}

Here is a user and password, in fact Elasticsearch plus shield can be forced to use the username password login. The output here is configured for this use.

Query Elasticsearch

For example above, I want to query the params.source of a certain time (actually the source parameter, but the previous params is the prefix) call case

$url = ' Http://xx.xx.xx.xx:9200/logstash-*/_search ';
$filter = '
{'
 query ': {'
    range ': {
      ' @timestamp ': {'
        gt ': 123213213213,
        ' lt ': 123213213 213
      }
    }
  ,
  "Aggs": {"
    Group_by_source": {"Terms": {"field": "Params.source"
  }}},
  "size": 0
} ';

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.