Troubleshooting a problem on a flume line

Source: Internet
Author: User
Tags google guava

Recently in a distributed call chain tracking system,

Flume is used in two places, one is the host system, and the flume agent is used for log collection. One is to write HBase from Kafka log parsing.

After this flume (from Kafka log analysis after writing flume) with 3 units, the system went online, after the online throw an exception:

caused By:org.apache.flume.ChannelException:Put queue for memorytransaction of capacity, consider committing MO Re frequently, increasing capacity or increasing thread count
At Org.apache.flume.channel.memorychannel$memorytransaction.doput (memorychannel.java:84)
At Org.apache.flume.channel.BasicTransactionSemantics.put (basictransactionsemantics.java:93)
At Org.apache.flume.channel.BasicChannelSemantics.put (basicchannelsemantics.java:80)
At Org.apache.flume.channel.ChannelProcessor.processEventBatch (channelprocessor.java:189)

Intuitive understanding from exception information the put queue for a memorychannel transaction is full, why is this?

Let's start with the flume architecture, Flume is an Apache open Source tool responsible for log capture and transmission, which is characterized by a flexible configuration for data conversion between different data stores. A simple agent can also implement a log collection platform (later write an article to summarize),

It has three of the most important components:

Source: Responsible for fetching data from the data source, including two types of source. Eventdrivensource and Pollablesource, the former refers to the event-driven data source, so the name Incredibles, is the need for external systems to actively send data, such as Avrosource, Thriftsource; Pollablesource refers to the need to actively pull data from the data source, such as Kafkasource, the source gets the data to the channel to write an event, flume event contains headers and body two parts, The former is a map composed of key-value pairs.

Sink: Responsible for pulling the event from the channel, writing downstream storage, or docking other agents.

Channel: Used to implement data buffering between source and sink, there are two main types of file channel and memory channel.

The architecture diagram for Flume is as follows:

And my flume configuration is as follows:

A1.sources = Kafkasource
A1.sinks = Hdfssink Hbasesink
A1.channels = Hdfschannel Hbasechannel


A1.sources.kafkasource.channels = Hdfschannel Hbasechannel

A1.sinks.hdfssink.channel = Hdfschannel
A1.sinks.hbasesink.channel = Hbasechannel

A1.sources.kafkasource.type = Org.apache.flume.source.kafka.KafkaSource
A1.sources.kafkasource.zookeeperConnect = zk1:2181,zk2:2181,zk3:2181
A1.sources.kafkasource.topic = Nagual_topic
A1.sources.kafkasource.groupId = Flume
a1.sources.kafkasource.kafka.consumer.timeout.ms = 500


a1.sinks.hdfssink.type = HDFs

a1.sinks.hdfssink.hdfs.fileprefix = Events-prefix

a1.sinks.hdfssink.hdfs.roundvalue =
a1.sinks.hdfssink.hdfs.roundunit = minute
a1.sinks.hdfssink.hdfs.filetype = sequencefile



a1.sinks.hdfssink.hdfs.rollsize =- 1

A1.sinks.hbasesink.type = HBase
A1.sinks.hbasesink.table = Htable_nagual_tracelog
A1.sinks.hbasesink.index_table = Htable_nagual_tracelog_index
A1.sinks.hbasesink.serializer =nagualtracelogeventserializer
a1.sinks.hbasesink.columnFamily = Rpcid
A1.sinks.hbasesink.zookeeperQuorum = zk1:2181,zk2:2181,zk3:2181


A1.channels.hdfschannel.type = Memory
a1.channels.hdfschannel.capacity= 10000
A1.channels.hdfschannel.byteCapacityBufferPercentage = 20
A1.channels.hdfschannel.byteCapacity = 536870912

That is, my flume agent from the Kafka pull log, converted into HBase Row put operation, the middle of the use of Memchannel, why the previous mentioned exception? Spent an afternoon of time, the source of Flume read through, basically found the problem lies.

We split the source code into the following main steps to analyze:

1, the start of Flume:

As shown, the main process for starting the entire flume is this:

The Flume-ng startup script in Flume_home starts application, application creates a pollingpropertiesfileconfigurationprovider, The role of this provider is to start a profile of the monitoring thread filewatcherrunnable, the timing of monitoring the configuration file changes,

Once the configuration file changes, the configuration of Sinkrunner, Sourcerunner and channel is re-packaged into materialedconfiguration, via Google Guava Eventbus Push configuration changes to application, application initiates a lifecyclesupervisor, which is responsible for monitoring

Sourcerunner, Sinkrunner,channel of the operating conditions. The Lifecycleaware interfaces are implemented or inherited by these green-box-labeled components, and the way you monitor them is a bit of a point: check that the expected status of these components is consistent with the current status by timing, and if not, call the status corresponding method.

For example, when booting, expect the Sinkrunner state to be running, then call Sinkrunner's Start method.

The order of initiation is channel---Sinkrunner, Sourcerunner (image of the first water pipe, then a basin, and then open the faucet. )

Take my flume configuration file, for example, using Memchannel. During the Abstractconfigurationprovider configuration phase, the good one Linkedblockingdeque is created (this queue is a globally unique, double-ended queue with the largest length). And when Abstractconfigurationprovider created the channel, One type of channel will only be created once (specifically to see Abstractconfigurationprovider's Getorcreatechannel method), so we can also understand why the memory in flume Configuration parameters about bytecapacity in the channel configuration there is this sentence:

Note If you have multiple memory channels on a single JVM, and they happen to hold the same physical events

This means that even if multiple memory channels are configured, a double-ended queue is shared.

Not finished, to be continued.

Troubleshooting a problem on a flume line

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.