Flume-kafka-logstash-elasticsearch-kibana Process Description

Source: Internet
Author: User
Tags geoip syslog zookeeper kibana logstash


First of all, the installation of the tools are not in this explanation, many online, can be viewed by themselves.

Here we use examples to illustrate the configuration of each tool and the effect of the final presentation.

If we have a batch of tracklog logs that need to be displayed in real time elk:

First, collect logs, we use Flume tool

The log server-side placement agent is sent to collect collect, configured as follows:

Agent (can be multiple)

Agent.sources = S1

Agent.channels = M1

Agent.sinks = K1

Agent.sources.s1.interceptors=i1

Agent.sources.s1.interceptors.i1.type=org.apache.flume.interceptor.hostbodyinterceptor$builder

# for each one of the sources, the type is defined

Agent.sources.s1.type = Com.source.tailDir.TailDirSourceNG

Agent.sources.s1.monitorpath=d:\\trackloguc

Agent.sources.s1.channels = M1

agent.sources.s1.fileencode=gb2312

# each sink ' s type must be defined

Agent.sinks.k1.type = Avro

agent.sinks.k1.hostname=10.130.2.249

agent.sinks.k1.port=26003

#agent. Sinks.k1.type = Logger

Agent.sinks.k1.channel = M1

# each channel ' s type is defined.

#agent. Channels.m1.type = Memory

#agent. channels.m1.capacity=100000

Agent.channels.m1.type = File

Agent.channels.m1.checkpointdir=. \\mobilecheck

Agent.channels.m1.datadirs=. \\mobiledata

agent.channels.m1.transactioncapacity=3000000

agent.channels.m1.capacity=10000000

Collect

Agent.sources = S1

Agent.channels = M1 m2

Agent.sinks = K1 K2

Agent.source.s1.selector.type=replicating

# for each one of the sources, the type is defined

Agent.sources.s1.type = Avro

agent.sources.s1.bind=10.130.2.249

agent.sources.s1.port=26002

Agent.sources.s1.channels = M1 m2

#放入Kafka

Agent.sinks.k1.type = Org.apache.flume.plugins.KafkaSink

agent.sinks.k1.metadata.broker.list=bdc53.hexun.com:9092,bdc54.hexun.com:9092,bdc46.hexun.com:9092

Agent.sinks.k1.serializer.class=kafka.serializer.stringencoder

Agent.sinks.k1.request.required.acks=0

agent.sinks.k1.max.message.size=100

Agent.sinks.k1.producer.type=sync

Agent.sinks.k1.custom.encoding=utf-8

Agent.sinks.k1.custom.topic.name=tracklogt

Agent.sinks.k1.channel = m2

#channel采用file方式 because the log is too large

Agent.channels.m1.type = File

Agent.channels.m1.checkpointdir=/opt/modules/apache-flume-1.5.2-bin/tracklog-kafka/checkpoint

Agent.channels.m1.datadirs=/opt/modules/apache-flume-1.5.2-bin/tracklog-kafka/datadir

Agent.channels.m1.transactionCapacity = 1000000

agent.channels.m1.capacity=1000000

Agent.channels.m1.checkpointInterval = 30000

Second, the data into the Kafka

The above collect topic need to be Kafka in advance, the other steps into the Kafka has been configured in the Collect.

To create a topic statement reference:

%{kafka_home}/bin/kafka-topics.sh--create--zookeeper bdc41.hexun.com--replication-factor 3--partitions 3--topic Tracklogt

View topic Data Statement Reference:

%{kafka_home}/bin/kafka-console-consumer.sh--zookeeper bdc46.hexun.com:2181,bdc40.hexun.com:2181, bdc41.hexun.com:2181--topic Tracklogt

Three, from Kafka to Elasticsearch

We use the Logstash tool to take Kafka data into ES, mainly because the Logstash tool is more closely aligned with ES and Kibana.

If we want to get topic for TRACKLOGT data into the Es,logstash configuration as follows:

input{

Kafka {

Zk_connect = "bdc41.hexun.com:2181,bdc40.hexun.com:2181,bdc46.hexun.com:2181,bdc54.hexun.com:2181, bdc53.hexun.com:2181 "

group_id = "Logstash"

topic_id = "Tracklogt"

Reset_beginning = False # Boolean (optional)

Consumer_threads = 5 # number (optional)

Decorate_events = True

}

}

Filter {

#multiline可以多行合一行, the way the costumes are matched in a regular fashion.

Multiline {

Pattern = "^\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\s\d{4}-\d{1,2}-\d{1,2}\s\d{2}:\d{2}:\d{2}"

Negate = True

what = "Previous"

}

#下面是用空格分隔每一行

Ruby {

init = "@kname =[' hostip ', ' dateday ', ' datetime ', ' IP ', ' Cookieid ', ' userid ', ' logserverip ', ' referer ', ' Requesturl ', ' Remark1 ', ' remark2 ', ' alexaflag ', ' ua ', ' Wirelessflag '] "

Code = "Event.append (hash[@kname. zip (event[' message '].split (//))])"

Remove_field = ["Message"]

add_field=>{

"Logsdate" = "%{dateday}"

}

}

#下面是替换logsdate字段中的-Is empty

mutate{

gsub=>["Logsdate", "-", "" "

# convert=>{"Dateday" = "Integer"}

}

#对于logsdate的格式不合规范的数据drop

If [logsdate]!~/\d{8}/{

drop{}

}

#对外网ip进行解析, geo-location information is automatically obtained

GeoIP {

Source = "IP"

# type = "Linux-syslog"

Add_tag = ["GeoIP"]

}

#对ua进行解析

useragent {

Source = "UA"

# type = "Linux-syslog"

Add_tag = ["useragent"]

}

}

output{

#入es

elasticsearch{

hosts = ["10.130.2.53:9200", "10.130.2.46:9200", "10.130.2.54:9200"]

flush_size=>50000

Workers = 5

Index=> "Logstash-tracklog"

}

}

Need to note:

1. The logsdate is replaced because: for example, the 2016-01-01 form of the field, into the ES, will be considered a time format, auto-completion is: 2016-01-01 08:00:00, resulting in kibana need to show the field by day is incorrect.

2. For some abnormal data, such as the Logsdate column should be a time number, such as 20160101, if there are some alphabetic characters of the abnormal data, in the Kibana display will be problematic, so the data dropped.

3. Because different business data have different format, need to deal with the data in filter, need to use the relevant plug-ins, related syntax, suggest more look at Logstash official documents.

Iv. data Display in Kibana

The following is a usage example, for reference only:

1. First enter the Kibana page, click the menu "Setting"-"Indices",

@ : You can fill in a wildcard form of a name so that you can monitor multiple indexes (typically data by talent index)

Click Create can be.

2. Click the Menu "Discover", select the setting map you just created, you can find the following:

@ then click Save in the upper right corner to enter a name.

@ This is the data source to be used in the following illustration, but you can also search for your data here, and note that it is best to double quotation marks on both sides of the string.

3. Click "Visualize" to make various icons.


You can choose which kind of chart to make, such as a histogram of daily statistics, and click the last one.

@order by The field type must be Date or int This is why it is important to emphasize the data type in the previous guide.

4. Finally click on the "DashBoard" menu to make the dashboard, you can set the previous discover and visualize the data and graphs stored in this instrument panel.

Flume-kafka-logstash-elasticsearch-kibana Process Description

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.