Logstash configuration and use of log analysis

Source: Internet
Author: User
Tags current time regular expression stdin kibana logstash

Logstash is a data analysis software that is primarily designed to analyze log logs. The whole set of software can be used as an MVC model, Logstash is the controller layer, Elasticsearch is a model layer, Kibana is the view layer.

First, the data is passed to Logstash, which filters and formats the data (in JSON format), and then passes it to Elasticsearch for storage, search indexing, Kibana provides front-end pages for search and chart visualization, It is called to visualize the data returned by the Elasticsearch interface. Logstash and Elasticsearch are written in Java, Kibana Use the node. JS Framework.

This software official website has the very detailed use instruction, https://www.elastic.co/, besides Docs, also has the video tutorial. This blog is a collection of some of the more important settings and uses of docs and videos. First, the configuration of Logstash 1. Define the data source

Write a configuration file that can be named logstash.conf and enter the following:

Input {
        file {
                path = "/data/web/logstash/logfile/*/*"
                start_position = "Beginning" #从文件开始处读写
        }
#       stdin {}  #可以从标准输入读数据
}

Defined data sources that support files, stdin, Kafka, Twitter, and so on, and can even write an input plugin yourself. If you write file with a wildcard character like above, it will automatically scan if a new log file is copied in.

2. Define the format of the data

Match with regular expressions according to the format of the log

Filter {

  #定义数据的格式
  grok {
    match + = {"Message" = "%{data:timestamp}\|%{ip:serverip}\|%{ip:clientip}" \|%{data:logsource}\|%{data:userid}\|%{data:requrl}\|%{data:requri}\|%{data:refer}\|%{data:device}\|%{data: Textduring}\|%{data:duringtime:int}\|\| "}
  }

}

Because the format of the logging is this:

2015-05-07-16:03:04|10.4.29.158|120.131.74.116| web|11299073|http://quxue.renren.com/shareapp?isappinstalled=0&userid=11299073&from=groupmessage|/ shareapp|null| mozilla/5.0 (IPhone; CPU iPhone os 8_2 like Mac os X applewebkit/600.1.4 (khtml, like Gecko) mobile/12d508 micromessenger/6.1.5 nettype/wifi|d uringtime|98| |

separated by | symbol, the first is access time, timestamp, as Logstash timestamp, followed by: Server IP, client IP, machine type (web/app/admin), the user's ID (no 0), the full URL of the request, The requested controller path, reference, device information, Duringtime, time spent by the request.

As the above code, the field is defined in turn, with a regular expression to match, data is logstash defined regular, actually is (. *), and defines the field name.

We take the time of the visit as the timestamp of the Logstash, and with this we can see how the request for parsing a certain period of time is based on time, and if there is no match to this time, Logstash will use the current time as the timestamp for that record. You need to filter inside the format that defines the timestamp, that is, the format of the log:

Filter {

  #定义数据的格式
  grok {#同上 ...}

  #定义时间戳的格式
  Date {
    match = = ["Timestamp", "YYYY-MM-DD-HH:MM:SS"]
    locale = "cn"
  }

}

In the above field you need to tell Logstash which is the client Ip,logstash will automatically fetch the relevant location information of the IP:

Filter {

  #定义数据的格式
  grok {#同上}

  #定义时间戳的格式
  date {#同上}

  #定义客户端的IP是哪个字段 (the data format defined above)
  GeoIP {
    Source = "ClientIP"
  }
}

Also has the client's UA, because the UA format is more, Logstash also automatically analyzes, extracts the operating system and so on related information

  #定义客户端设备是哪一个字段
  useragent {
    Source = "Device"
    target = "Userdevice"
  }

Which fields are integral type, also need to tell Logstash, for later analysis can be sorted, the use of data inside only one time

  #需要进行转换的字段, here is the time to turn the visit to int, and then to Elasticsearch
  mutate {
    convert = = ["Duringtime", "Integer"]
  }
3. Output Configuration

Finally, the output is configured to output the data of the filter buckle to the Elasticsearch

Output {
  #将输出保存到elasticsearch, if no match to time is not saved, because the URL parameters in the log with a newline
  if [timestamp] =~/^\d{4}-\d{2}-\d{2}/{
        elasticsearch {host = localhost}
  }

   #输出到stdout
#  stdout {codec = rubydebug}

   #定义访问数据的用户名和密码
#  user = WebService
#< C21/>password = 1q2w3e4r
}

We will save the above configuration to logstash.conf and then run the Logstash

After the Logstash boot is complete, enter the above access record, and Logstash will output the filtered data:

You can see the Logstash, automatically query IP attribution, and the request inside the device field for analysis. ii. configuration of Elasticsearch and Kibana 1. Elasticsearch

This does not need to be done, use the default configuration. Configuration is: config/elasticsearch.yml

If you need to set the data expiration time, you can add these two lines (visually matching, not verified, the reader can try):

#设置为30天过期

indices.cache.filter.expire:30d

index.cache.filter:30d

Elasticsearch the default listener on port 9200, which can be queried and managed, such as the health status of the index:

Curl ' Localhost:9200/_cluster/health?level=indices&pretty '

Output

{" cluster_name ":" Elasticsearch "," status ":" Yellow "," timed_out ": false," numb Er_of_nodes ": 2," Number_of_data_nodes ": 1," active_primary_shards ": 161," Active_shards ": 161," relocating_sh
    Ards ": 0," Initializing_shards ": 0," unassigned_shards ": 161," Number_of_pending_tasks ": 0," indices ": { "logstash-2015.05.05": {"status" 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.