Logstash is a data analysis software that is primarily designed to analyze log logs. The whole set of software can be used as an MVC model, Logstash is the controller layer, Elasticsearch is a model layer, Kibana is the view layer.
First, the data is passed to Logstash, which filters and formats the data (in JSON format), and then passes it to Elasticsearch for storage, search indexing, Kibana provides front-end pages for search and chart visualization, It is called to visualize the data returned by the Elasticsearch interface. Logstash and Elasticsearch are written in Java, Kibana Use the node. JS Framework.
This software official website has the very detailed use instruction, https://www.elastic.co/, besides Docs, also has the video tutorial. This blog is a collection of some of the more important settings and uses of docs and videos. First, the configuration of Logstash 1. Define the data source
Write a configuration file that can be named logstash.conf and enter the following:
Input {
file {
path = "/data/web/logstash/logfile/*/*"
start_position = "Beginning" #从文件开始处读写
}
# stdin {} #可以从标准输入读数据
}
Defined data sources that support files, stdin, Kafka, Twitter, and so on, and can even write an input plugin yourself. If you write file with a wildcard character like above, it will automatically scan if a new log file is copied in.
2. Define the format of the data
Match with regular expressions according to the format of the log
Filter {
#定义数据的格式
grok {
match + = {"Message" = "%{data:timestamp}\|%{ip:serverip}\|%{ip:clientip}" \|%{data:logsource}\|%{data:userid}\|%{data:requrl}\|%{data:requri}\|%{data:refer}\|%{data:device}\|%{data: Textduring}\|%{data:duringtime:int}\|\| "}
}
}
Because the format of the logging is this:
2015-05-07-16:03:04|10.4.29.158|120.131.74.116| web|11299073|http://quxue.renren.com/shareapp?isappinstalled=0&userid=11299073&from=groupmessage|/ shareapp|null| mozilla/5.0 (IPhone; CPU iPhone os 8_2 like Mac os X applewebkit/600.1.4 (khtml, like Gecko) mobile/12d508 micromessenger/6.1.5 nettype/wifi|d uringtime|98| |
separated by | symbol, the first is access time, timestamp, as Logstash timestamp, followed by: Server IP, client IP, machine type (web/app/admin), the user's ID (no 0), the full URL of the request, The requested controller path, reference, device information, Duringtime, time spent by the request.
As the above code, the field is defined in turn, with a regular expression to match, data is logstash defined regular, actually is (. *), and defines the field name.
We take the time of the visit as the timestamp of the Logstash, and with this we can see how the request for parsing a certain period of time is based on time, and if there is no match to this time, Logstash will use the current time as the timestamp for that record. You need to filter inside the format that defines the timestamp, that is, the format of the log:
Filter {
#定义数据的格式
grok {#同上 ...}
#定义时间戳的格式
Date {
match = = ["Timestamp", "YYYY-MM-DD-HH:MM:SS"]
locale = "cn"
}
}
In the above field you need to tell Logstash which is the client Ip,logstash will automatically fetch the relevant location information of the IP:
Filter {
#定义数据的格式
grok {#同上}
#定义时间戳的格式
date {#同上}
#定义客户端的IP是哪个字段 (the data format defined above)
GeoIP {
Source = "ClientIP"
}
}
Also has the client's UA, because the UA format is more, Logstash also automatically analyzes, extracts the operating system and so on related information
#定义客户端设备是哪一个字段
useragent {
Source = "Device"
target = "Userdevice"
}
Which fields are integral type, also need to tell Logstash, for later analysis can be sorted, the use of data inside only one time
#需要进行转换的字段, here is the time to turn the visit to int, and then to Elasticsearch
mutate {
convert = = ["Duringtime", "Integer"]
}
3. Output Configuration
Finally, the output is configured to output the data of the filter buckle to the Elasticsearch
Output {
#将输出保存到elasticsearch, if no match to time is not saved, because the URL parameters in the log with a newline
if [timestamp] =~/^\d{4}-\d{2}-\d{2}/{
elasticsearch {host = localhost}
}
#输出到stdout
# stdout {codec = rubydebug}
#定义访问数据的用户名和密码
# user = WebService
#< C21/>password = 1q2w3e4r
}
We will save the above configuration to logstash.conf and then run the Logstash
After the Logstash boot is complete, enter the above access record, and Logstash will output the filtered data:
You can see the Logstash, automatically query IP attribution, and the request inside the device field for analysis. ii. configuration of Elasticsearch and Kibana 1. Elasticsearch
This does not need to be done, use the default configuration. Configuration is: config/elasticsearch.yml
If you need to set the data expiration time, you can add these two lines (visually matching, not verified, the reader can try):
#设置为30天过期
indices.cache.filter.expire:30d
index.cache.filter:30d
Elasticsearch the default listener on port 9200, which can be queried and managed, such as the health status of the index:
Curl ' Localhost:9200/_cluster/health?level=indices&pretty '
Output
{" cluster_name ":" Elasticsearch "," status ":" Yellow "," timed_out ": false," numb Er_of_nodes ": 2," Number_of_data_nodes ": 1," active_primary_shards ": 161," Active_shards ": 161," relocating_sh
Ards ": 0," Initializing_shards ": 0," unassigned_shards ": 161," Number_of_pending_tasks ": 0," indices ": { "logstash-2015.05.05": {"status"