Flume configuration:
#DBFileDBFile. Sources = sources1 dbfile.sinks = sinks1 dbfile.channels = channels1 # Dbfile-db-source DBFile.sources.sources1.type = SpooldirDBFile.sources.sources1.spoolDir =/var/log/apache/flumespool// Dbdbfile.sources.sources1.inputcharset=utf-8 # dbfile-sink DBFile.sinks.sinks1.type = Org.apache.flume.sink.kafka.KafkaSink DBFile.sinks.sinks1.topic = DBFileDBFile.sinks.sinks1.brokerList = Hdp01 : 6667,hdp02:6667,hdp07:6667dbfile.sinks.sinks1.requiredacks = 1 DBFile.sinks.sinks1.batchSize = # Dbfile-channeldbfile.channels.channels1.type = MemoryDBFile.channels.channels1.capacity = 10000dbfile.channels.channels1.transactioncapacity = 1000# Dbfile-source and Sink to the ChannelDBFile.sources.sources1.channels = Channels1DBFile.sinks.sinks1.channel = Channels1
Symptom: When uploading a file for the first time, Flume can quickly process the file, upload it later or show the file is not processed. If the flume service is restarted, it can be processed immediately.
After testing, the cause of the problem is on this configuration: DBFile.sinks.sinks1.requiredAcks =-1.
Requiredacks's official explanation: How many replicas must acknowledge a message before its considered successfully written. Accepted values is 0 (never wait for acknowledgement),
1 (wait for leader only),-1 (wait for any replicas) Set this to-1 to avoid data loss in some cases of leader failure.
It would be nice to change this value to 1.
Flume:spooldir capture Log, Kafka output configuration issues