Kafak, Flume, Elasticsearch

Last Update:2018-07-25 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Target: Using flume agent implementation, the data in the Kafka is taken out and fed into elasticsearch.

Analysis: Flume agent needs to work, two points: Flume Kafka Source: Responsible for reading from the Kafka data; Flume ElasticSearch Sink: Responsible for the data into the ElasticSearch;

The current flume 1.5.2 already contains Elasticsearchsink, so a custom implementation of Flume Kafka source is required. It is now known from Jira that the Flume 1.6.0 will contain flume-ng-kafka-source, however, the current Flume 1.6.0 version is not released, what to do. Two roads: Flume-ng-kafka-source Flume 1.6.0 in the code of the branch in GitHub Flume-ng-kafka-source

The Flume-ng-kafka-source part of the flume 1.6.0 Branch is initially selected, and this part of the code is already included in the Flume-ng-extends-source. Compiling code

Execute command: mvn clean packages get jar package: Flume-ng-extends-source-x.x.x.jar. Installing plugins

Two types of jar packages: Lib Jar package Flume-ng-extends-source-x.x.x.jar Libext in jar package Kafka_2.9.2-0.8.2.0.jar Kafka-clients-0.8.2.0.jar Metrics-core-2.2.0.jar Scala-library-2.9.2.jar Zkclient-0.3.jar

question : How to export the current jar package and its dependent packages when MAVEN is packaged. Reference Thilinamb Flume Kafka sink configuration

The

is configured in the properties file to configure the sample file:

# Kafka Source for retrieve from Kafka cluster.
Agent.sources.seqGenSrc.type = Com.github.ningg.flume.source.KafkaSource #agent. sources.seqGenSrc.batchSize = 2
Agent.sources.seqGenSrc.batchDurationMillis = Agent.sources.seqGenSrc.topic = good
Agent.sources.seqGenSrc.zookeeperConnect = 168.7.2.164:2181,168.7.2.165:2181,168.7.2.166:2181 Agent.sources.seqGenSrc.groupId = Elasticsearch #agent. sources.seqGenSrc.kafka.consumer.timeout.ms = 1000 #
Agent.sources.seqGenSrc.kafka.auto.commit.enable = False # Elasticsearchsink for ElasticSearch.
Agent.sinks.loggerSink.type = Org.apache.flume.sink.elasticsearch.ElasticSearchSink
Agent.sinks.loggerSink.indexName = Flume Agent.sinks.loggerSink.indexType = Log agent.sinks.loggerSink.batchSize = 100 #agent. Sinks.loggerSink.ttl = 5 Agent.sinks.loggerSink.client = Transport Agent.sinks.loggerSink.hostNames = 168.7.1.69:9300 #agent. sinks.loggerSink.client = Rest #agent. sinks.loggerSink.hostNames = 168.7.1.69:9200 # Agent.sinks.loggerSink.serializer =Org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer

Custom

Goal: Customize Elasticsearchsink's serializer.

Phenomenon: After setting the parameter batchsize=1000 of Elasticsearchsink, the 120,000+ record appears in index of the current ES, and at this time, the original platform found that the data currently produced is only 20,000, so Guess Kafkasource all the data under the specified topic in the Kafka cluster is passed into ES.

Some:

New configuration parameters in Elasticsearchsink: Indexnamebuilder=org.apache.flume.sink.elasticsearch.timebasedindexnamebuilder The above will be in INDEXPREFIX-YYYY-MM-DD mode, each day to produce an index; Other options: Org.apache.flume.sink.elasticsearch.SimpleIndexNameBuilder, which is directly set to the Indexprefix (IndexName is actually set) DateFormat =YYYY-MM-DD TIMEZONE=ETC/UTC Serializer=org.apache.flume.sink.elasticsearch.elasticsearchlogstasheventserializer The above option adds Key-value to a new field @fields in the Header of the Flume event; Other options: Org.apache.flume.sink.elasticsearch.ElasticSearchDynamicSerializer, which directly constructs the body, the header as a JSON string, Added to the Elasticsearch. Restart

If you terminate the flume Agent, then restart. Question: Whether the data in Kafka will be sent to Elasticsearch repeatedly. The data in the Kafka is omitted and is not sent to Elasticsearch.

Think, several cases: Kafka corresponding consumer has offset Kafka in the data, periodic cleanup, such as the default 3 days

The restart scenario of the Flume agent needs to be considered in detail.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Kafak, Flume, Elasticsearch

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Kafak, Flume, Elasticsearch

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support