Log Centralized management system Elk-logstash-grok detailed

Source: Internet
Author: User
Tags apache log logstash

The log generated by the general system or service is a long string. Each field is separated by a space. Logstash in the Get log is the entire string fetch, if it can be separated by the meaning of each field represented in the log is passed to Elasticsearch. The result will be better, and also make the Kibana more convenient to draw graphics.

Grok is the most important plugin for Logstash. Its main role is to convert text-formatted strings into concrete structured data, which can be used in conjunction with regular expressions.

Grok-expression

Below for Apache log to split processing

filter {if [type] = = "Apache" {grok {match = = ["Message" = "%{iporhost:addre}%{user:ident}%{user:auth } \[%{httpdate:timestamp}\] \ "%{word:http_method}%{notspace:request} http/%{number:httpversion}\"%{NUMBER:status}    (?:%{number:bytes}|-) \ "(?:%{uri:http_referer}|-) \" \ "%{greedydata:user_agent}\" "] Remove_field = = [" Message "] } date {match = = ["Timestamp", "Dd/mmm/yyyy:hh:mm:ss Z"]}}

Below is the Apache log

192.168.10.97--[19/jul/2016:16:28:52 +0800] "get/http/1.1"-"" mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/45.0.2454.101 safari/537.36 "

Each field in the log is separated by a space, and one by one corresponds to the field in the message.

such as:%{iporhost:addre} ==> 192.168.10.197

But the problem is that Iporhost is not a regular expression, how can it match an IP address?

That's because ipprhost is the grok expression, which represents the following regular expression:

ipv6  (([0-9a-fa-f]{1,4}:) {7} ([0-9a-fa-f]{1,4}|:)) | (([0-9a-fa-f]{1,4}:) {6} (: [0-9a-fa-f]{1,4}| ( (25[0-5]|2[0-4]\d|1\d\d| [1-9]?\d) (\. ( 25[0-5]|2[0-4]\d|1\d\d| [1-9]?\d)) {3}) |:)) | (([0-9a-fa-f]{1,4}:) {5} (((: [0-9a-fa-f]{1,4}) {) |:( (25[0-5]|2[0-4]\d|1\d\d| [1-9]?\d) (\. ( 25[0-5]|2[0-4]\d|1\d\d| [1-9]?\d)) {3}) |:)) | (([0-9a-fa-f]{1,4}:) {4} (((: [0-9a-fa-f]{1,4}) {1,3}) | ( (: [0-9a-fa-f]{1,4})?:( (25[0-5]|2[0-4]\d|1\d\d| [1-9]?\d) (\. ( 25[0-5]|2[0-4]\d|1\d\d| [1-9]?\d)) {3}) |:)) | (([0-9a-fa-f]{1,4}:) {3} (((: [0-9a-fa-f]{1,4}) {1,4}) | ( (: [0-9a-fa-f]{1,4}) {0,2}:((25[0-5]|2[0-4]\d|1\d\d|[ 1-9]?\d) (\. ( 25[0-5]|2[0-4]\d|1\d\d| [1-9]?\d)) {3}) |:)) | (([0-9a-fa-f]{1,4}:) {2} (((: [0-9a-fa-f]{1,4}) {1,5}) | ( (: [0-9a-fa-f]{1,4}) {0,3}:((25[0-5]|2[0-4]\d|1\d\d|[ 1-9]?\d) (\. ( 25[0-5]|2[0-4]\d|1\d\d| [1-9]?\d)) {3}) |:)) | (([0-9a-fa-f]{1,4}:) {1} (((: [0-9a-fa-f]{1,4}) {1,6}) | ( (: [0-9a-fa-f]{1,4}) {0,4}:((25[0-5]|2[0-4]\d|1\d\d|[ 1-9]?\d) (\. ( 25[0-5]|2[0-4]\d|1\d\d| [1-9]?\d)) {3}) |:)) | (:(((: [0-9a-fa-f]{1,4}) {1,7}) | ((: [0-9a-fa-f]{1,4}) {0,5}:((25[0-5]|2[0-4]\d|1\d\d| [1-9]?\d) (\. ( 25[0-5]|2[0-4]\d|1\d\d| [1-9]?\d)) {3})) |:))) (%.+)? ipv4  (? <![ 0-9]) (?:(?: [0-1]?[ 0-9]{1,2}|2[0-4][0-9]|25[0-5]) [.] (?: [0-1]? [0-9] {1,2}|2[0-4][0-9]|25[0-5]) [.] (?: [0-1]? [0-9] {1,2}|2[0-4][0-9]|25[0-5]) [.] (?: [0-1]? [0-9] {1,2}|2[0-4][0-9]|25[0-5])) (?! [0-9]) ip  (?:%{ipv6}|%{ipv4}) hostname \b (?: [0-9a-za-z][0-9a-za-z-]{0,62}) (?: \. (?: [0-9a-za-z][0-9a-za-z-]{0,62})) *(\.?| \b) iporhost  (?:%{ip}|%{hostname})

Iporhost represents a Grok expression that matches IPv4 or IPv6 or hostname.

The above iporhost is a bit complicated, let's take a look at the simple point, such as user

USERNAME [a-za-z0-9._-]+

USER%{username}

In the first line, a grok expression is defined with a regular regular expression, and the second line, by printing the assignment format, defines another grok expression with the previously defined Grok expression.

Let us first introduce the following Grok syntax:%{syntax:semantic}

Syntax represents the regular expression substitution field, and semantic is the field name that corresponds to the expression, which you can freely name. This name is as easy to understand as possible to express the meaning of this field.


So where Iporhost has been defined, what can we use directly?

Logstash is installed with regular expressions that have already been written. The path is as follows:

/usr/local/logstash-2.3.4/vendor/bundle/jruby/1.9/gems/logstash-patterns-core-2.0.5/patterns

or direct access to Https://github.com/elastic/logstash/blob/v1.4.2/patterns/grok-patterns

The above Iporhost,user are already defined in the inside! Of course, there are other, basically can meet our needs .


Log Matching

When we get a log and follow the Grok expression above to match each other, how do we really match the correct?

http://grokdebug.herokuapp.com/This address can meet our testing needs. Take the Apache log test above.

650) this.width=650; "src=" http://s5.51cto.com/wyfs02/M01/84/61/wKiom1ePFe3CrrREAABi3KIa6cs819.jpg "title=" 6236. Tmp.jpg "alt=" Wkiom1epfe3crrreaabi3kia6cs819.jpg "/> Click on the following data appears, you write each grok expression to get the value. For accurate testing, several logs can be tested.

{   "Addre": [    [       "192.168.10.97"      ]  ],   "HOSTNAME": [    [        "192.168.10.97",       "192.168.10.175"      ]........... Omit multiple lines ...........   "Http_referer" in Middle: [    [        "http://192.168.10.175/"     ]  ],   "Uriproto": [     [       "http"     ]  ],    "Urihost": [    [       "192.168.10.175"      ]  ],   "Iporhost": [    [        "192.168.10.175"     ]  ],   "user_agent":  [     [       "mozilla/5.0  (WINDOWS&NBSP;NT&NBSP;6.1;&NBSP;WOW64)  AppleWebKit/ 537.36  (Khtml, like gecko)  chrome/45.0.2454.101 safari/537.36 "     ]  ]}

There are always some fields in each log that have no data to display, and then replace with "-". All of us have to judge when we match the log.

such as: (?:%{number:bytes}|-)

But some strings are too long, such as: mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/45.0.2454.101 safari/537.36

We can use%{greedydata browser}.

The corresponding Grok expression: greedydata. *


Custom Grok expressions

You can also define your own logstash if you feel that your grok expression does not meet your needs.

Such as:

filter {if [type] = = "Apache" {grok {patterns_dir = "/usr/local/logstash-2.3.4/ownpatterns/patterns"    Match + = {"Message" = "%{apache_log}"} Remove_field = ["Message"]} Date {match = = ["Timestamp", "Dd/mmm/yyyy:hh:mm:ss Z"]}}}

Patterns_dir is the path to the Grok expression that is defined only.

The custom patterns is written in the format Logstash comes with.

Apache_log%{iporhost:addre}%{user:ident}%{user:auth} \[%{httpdate:timestamp}\] \ "%{word:http_method}%{NOTSPACE: Request} http/%{number:httpversion}\ "%{number:status} (?:%{number:bytes}|-) \" (?:%{uri:http_referer}|-) \ "\"%{ Greedydata:user_agent}\ "

I just write the Apache log matching grok expression to the customization file, simplifying the Conf file. Regular expression matching for a single field you can write your own test.


If you grok all the information in the "message" to a different field, the data is essentially the equivalent of two copies stored in duplicate. So you can useremove_field parameter to remove the message field, or use the overwrite parameter to override the default message field, leaving only the most important part.


This article is from the "Tranquility Zhiyuan" blog, please be sure to keep this source http://irow10.blog.51cto.com/2425361/1828077

Log Centralized management system Elk-logstash-grok detailed

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.