Troubleshooting a problem on a flume line

Last Update:2015-12-23 Source: Internet

Author: User

Tags google guava

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recently in a distributed call chain tracking system,

Flume is used in two places, one is the host system, and the flume agent is used for log collection. One is to write HBase from Kafka log parsing.

After this flume (from Kafka log analysis after writing flume) with 3 units, the system went online, after the online throw an exception:

caused By:org.apache.flume.ChannelException:Put queue for memorytransaction of capacity, consider committing MO Re frequently, increasing capacity or increasing thread count
At Org.apache.flume.channel.memorychannel$memorytransaction.doput (memorychannel.java:84)
At Org.apache.flume.channel.BasicTransactionSemantics.put (basictransactionsemantics.java:93)
At Org.apache.flume.channel.BasicChannelSemantics.put (basicchannelsemantics.java:80)
At Org.apache.flume.channel.ChannelProcessor.processEventBatch (channelprocessor.java:189)

Intuitive understanding from exception information the put queue for a memorychannel transaction is full, why is this?

Let's start with the flume architecture, Flume is an Apache open Source tool responsible for log capture and transmission, which is characterized by a flexible configuration for data conversion between different data stores. A simple agent can also implement a log collection platform (later write an article to summarize),

It has three of the most important components:

Source: Responsible for fetching data from the data source, including two types of source. Eventdrivensource and Pollablesource, the former refers to the event-driven data source, so the name Incredibles, is the need for external systems to actively send data, such as Avrosource, Thriftsource; Pollablesource refers to the need to actively pull data from the data source, such as Kafkasource, the source gets the data to the channel to write an event, flume event contains headers and body two parts, The former is a map composed of key-value pairs.

Sink: Responsible for pulling the event from the channel, writing downstream storage, or docking other agents.

Channel: Used to implement data buffering between source and sink, there are two main types of file channel and memory channel.

The architecture diagram for Flume is as follows:

And my flume configuration is as follows:

A1.sources = Kafkasource
A1.sinks = Hdfssink Hbasesink
A1.channels = Hdfschannel Hbasechannel

A1.sources.kafkasource.channels = Hdfschannel Hbasechannel

A1.sinks.hdfssink.channel = Hdfschannel
A1.sinks.hbasesink.channel = Hbasechannel

A1.sources.kafkasource.type = Org.apache.flume.source.kafka.KafkaSource
A1.sources.kafkasource.zookeeperConnect = zk1:2181,zk2:2181,zk3:2181
A1.sources.kafkasource.topic = Nagual_topic
A1.sources.kafkasource.groupId = Flume
a1.sources.kafkasource.kafka.consumer.timeout.ms = 500

a1.sinks.hdfssink.type = HDFs

a1.sinks.hdfssink.hdfs.fileprefix = Events-prefix

a1.sinks.hdfssink.hdfs.roundvalue =
a1.sinks.hdfssink.hdfs.roundunit = minute
a1.sinks.hdfssink.hdfs.filetype = sequencefile

a1.sinks.hdfssink.hdfs.rollsize =- 1

A1.sinks.hbasesink.type = HBase
A1.sinks.hbasesink.table = Htable_nagual_tracelog
A1.sinks.hbasesink.index_table = Htable_nagual_tracelog_index
A1.sinks.hbasesink.serializer =nagualtracelogeventserializer
a1.sinks.hbasesink.columnFamily = Rpcid
A1.sinks.hbasesink.zookeeperQuorum = zk1:2181,zk2:2181,zk3:2181

A1.channels.hdfschannel.type = Memory
a1.channels.hdfschannel.capacity= 10000
A1.channels.hdfschannel.byteCapacityBufferPercentage = 20
A1.channels.hdfschannel.byteCapacity = 536870912

That is, my flume agent from the Kafka pull log, converted into HBase Row put operation, the middle of the use of Memchannel, why the previous mentioned exception? Spent an afternoon of time, the source of Flume read through, basically found the problem lies.

We split the source code into the following main steps to analyze:

1, the start of Flume:

As shown, the main process for starting the entire flume is this:

The Flume-ng startup script in Flume_home starts application, application creates a pollingpropertiesfileconfigurationprovider, The role of this provider is to start a profile of the monitoring thread filewatcherrunnable, the timing of monitoring the configuration file changes,

Once the configuration file changes, the configuration of Sinkrunner, Sourcerunner and channel is re-packaged into materialedconfiguration, via Google Guava Eventbus Push configuration changes to application, application initiates a lifecyclesupervisor, which is responsible for monitoring

Sourcerunner, Sinkrunner,channel of the operating conditions. The Lifecycleaware interfaces are implemented or inherited by these green-box-labeled components, and the way you monitor them is a bit of a point: check that the expected status of these components is consistent with the current status by timing, and if not, call the status corresponding method.

For example, when booting, expect the Sinkrunner state to be running, then call Sinkrunner's Start method.

The order of initiation is channel---Sinkrunner, Sourcerunner (image of the first water pipe, then a basin, and then open the faucet. ）

Take my flume configuration file, for example, using Memchannel. During the Abstractconfigurationprovider configuration phase, the good one Linkedblockingdeque is created (this queue is a globally unique, double-ended queue with the largest length). And when Abstractconfigurationprovider created the channel, One type of channel will only be created once (specifically to see Abstractconfigurationprovider's Getorcreatechannel method), so we can also understand why the memory in flume Configuration parameters about bytecapacity in the channel configuration there is this sentence:

Note If you have multiple memory channels on a single JVM, and they happen to hold the same physical events

This means that even if multiple memory channels are configured, a double-ended queue is shared.

Not finished, to be continued.

Troubleshooting a problem on a flume line

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More