First of all, the installation of the tools are not in this explanation, many online, can be viewed by themselves.
Here we use examples to illustrate the configuration of each tool and the effect of the final presentation.
If we have a batch of tracklog logs that need to be displayed in real time elk:
First, collect logs, we use Flume tool
The log server-side placement agent is sent to collect collect, configured as follows:
Agent (can be multiple)
Agent.sources = S1 Agent.channels = M1 Agent.sinks = K1 Agent.sources.s1.interceptors=i1 Agent.sources.s1.interceptors.i1.type=org.apache.flume.interceptor.hostbodyinterceptor$builder # for each one of the sources, the type is defined Agent.sources.s1.type = Com.source.tailDir.TailDirSourceNG Agent.sources.s1.monitorpath=d:\\trackloguc Agent.sources.s1.channels = M1 agent.sources.s1.fileencode=gb2312 # each sink ' s type must be defined Agent.sinks.k1.type = Avro agent.sinks.k1.hostname=10.130.2.249 agent.sinks.k1.port=26003 #agent. Sinks.k1.type = Logger Agent.sinks.k1.channel = M1 # each channel ' s type is defined. #agent. Channels.m1.type = Memory #agent. channels.m1.capacity=100000 Agent.channels.m1.type = File Agent.channels.m1.checkpointdir=. \\mobilecheck Agent.channels.m1.datadirs=. \\mobiledata agent.channels.m1.transactioncapacity=3000000 agent.channels.m1.capacity=10000000 |
Collect
Agent.sources = S1 Agent.channels = M1 m2 Agent.sinks = K1 K2 Agent.source.s1.selector.type=replicating # for each one of the sources, the type is defined Agent.sources.s1.type = Avro agent.sources.s1.bind=10.130.2.249 agent.sources.s1.port=26002 Agent.sources.s1.channels = M1 m2 #放入Kafka Agent.sinks.k1.type = Org.apache.flume.plugins.KafkaSink agent.sinks.k1.metadata.broker.list=bdc53.hexun.com:9092,bdc54.hexun.com:9092,bdc46.hexun.com:9092 Agent.sinks.k1.serializer.class=kafka.serializer.stringencoder Agent.sinks.k1.request.required.acks=0 agent.sinks.k1.max.message.size=100 Agent.sinks.k1.producer.type=sync Agent.sinks.k1.custom.encoding=utf-8 Agent.sinks.k1.custom.topic.name=tracklogt Agent.sinks.k1.channel = m2 #channel采用file方式 because the log is too large Agent.channels.m1.type = File Agent.channels.m1.checkpointdir=/opt/modules/apache-flume-1.5.2-bin/tracklog-kafka/checkpoint Agent.channels.m1.datadirs=/opt/modules/apache-flume-1.5.2-bin/tracklog-kafka/datadir Agent.channels.m1.transactionCapacity = 1000000 agent.channels.m1.capacity=1000000 Agent.channels.m1.checkpointInterval = 30000 |
Second, the data into the Kafka
The above collect topic need to be Kafka in advance, the other steps into the Kafka has been configured in the Collect.
To create a topic statement reference:
%{kafka_home}/bin/kafka-topics.sh--create--zookeeper bdc41.hexun.com--replication-factor 3--partitions 3--topic Tracklogt |
View topic Data Statement Reference:
%{kafka_home}/bin/kafka-console-consumer.sh--zookeeper bdc46.hexun.com:2181,bdc40.hexun.com:2181, bdc41.hexun.com:2181--topic Tracklogt |
Three, from Kafka to Elasticsearch
We use the Logstash tool to take Kafka data into ES, mainly because the Logstash tool is more closely aligned with ES and Kibana.
If we want to get topic for TRACKLOGT data into the Es,logstash configuration as follows:
input{ Kafka { Zk_connect = "bdc41.hexun.com:2181,bdc40.hexun.com:2181,bdc46.hexun.com:2181,bdc54.hexun.com:2181, bdc53.hexun.com:2181 " group_id = "Logstash" topic_id = "Tracklogt" Reset_beginning = False # Boolean (optional) Consumer_threads = 5 # number (optional) Decorate_events = True } } Filter { #multiline可以多行合一行, the way the costumes are matched in a regular fashion. Multiline { Pattern = "^\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\s\d{4}-\d{1,2}-\d{1,2}\s\d{2}:\d{2}:\d{2}" Negate = True what = "Previous" } #下面是用空格分隔每一行 Ruby { init = "@kname =[' hostip ', ' dateday ', ' datetime ', ' IP ', ' Cookieid ', ' userid ', ' logserverip ', ' referer ', ' Requesturl ', ' Remark1 ', ' remark2 ', ' alexaflag ', ' ua ', ' Wirelessflag '] " Code = "Event.append (hash[@kname. zip (event[' message '].split (//))])" Remove_field = ["Message"] add_field=>{ "Logsdate" = "%{dateday}" } } #下面是替换logsdate字段中的-Is empty mutate{ gsub=>["Logsdate", "-", "" " # convert=>{"Dateday" = "Integer"} } #对于logsdate的格式不合规范的数据drop If [logsdate]!~/\d{8}/{ drop{} } #对外网ip进行解析, geo-location information is automatically obtained GeoIP { Source = "IP" # type = "Linux-syslog" Add_tag = ["GeoIP"] } #对ua进行解析 useragent { Source = "UA" # type = "Linux-syslog" Add_tag = ["useragent"] } } output{ #入es elasticsearch{ hosts = ["10.130.2.53:9200", "10.130.2.46:9200", "10.130.2.54:9200"] flush_size=>50000 Workers = 5 Index=> "Logstash-tracklog" } } |
Need to note:
1. The logsdate is replaced because: for example, the 2016-01-01 form of the field, into the ES, will be considered a time format, auto-completion is: 2016-01-01 08:00:00, resulting in kibana need to show the field by day is incorrect.
2. For some abnormal data, such as the Logsdate column should be a time number, such as 20160101, if there are some alphabetic characters of the abnormal data, in the Kibana display will be problematic, so the data dropped.
3. Because different business data have different format, need to deal with the data in filter, need to use the relevant plug-ins, related syntax, suggest more look at Logstash official documents.
Iv. data Display in Kibana
The following is a usage example, for reference only:
1. First enter the Kibana page, click the menu "Setting"-"Indices",
@ : You can fill in a wildcard form of a name so that you can monitor multiple indexes (typically data by talent index)
Click Create can be.
2. Click the Menu "Discover", select the setting map you just created, you can find the following:
@ then click Save in the upper right corner to enter a name.
@ This is the data source to be used in the following illustration, but you can also search for your data here, and note that it is best to double quotation marks on both sides of the string.
3. Click "Visualize" to make various icons.
You can choose which kind of chart to make, such as a histogram of daily statistics, and click the last one.
@order by The field type must be Date or int This is why it is important to emphasize the data type in the previous guide.
4. Finally click on the "DashBoard" menu to make the dashboard, you can set the previous discover and visualize the data and graphs stored in this instrument panel.
Flume-kafka-logstash-elasticsearch-kibana Process Description