Kafka as the current popular high-concurrency message middleware, a large number of data acquisition, real-time processing and other scenarios, we enjoy his high concurrency, high reliability, or have to face the possible problems, the most common is to lose packets, re-issue. Packet loss problem: Message-driven service, every morning, mobile phones on the terminal will give users push messages, when traffic surges, may appear Kafka send data too fast, causing the server network card is full, or the disk is busy, may appear drops packet phenomenon.
Solution: First Kafka speed limit, second enable retry mechanism, retry interval set a bit longer, and finally Kafka set Acks=all.
Detection method: Use the replay mechanism to see where the problem is.
The Kafka configuration is as follows:
Props.put ("Compression.type", "gzip");
Props.put ("linger.ms", "" ");
Props.put ("ACKs", "all");
Props.put ("retries", +);
Props.put ("reconnect.backoff.ms", 20000);
Props.put ("retry.backoff.ms", 20000);
Re-issue: When consumers redistribute partition, they may start spending from scratch, causing the problem to be re-issued. When consumer consumption is very slow, it may not be completed in a session cycle, causing heartbeat mechanism detection report problems.
Underlying root cause: data has been consumed, but offset has not been submitted.
Configuration issue: Offset auto-commit set
Problem Scenario:
1. Set offset to auto-commit, consuming data, kill consumer thread;
2. Set offset to auto commit, close Kafka, if Call Consumer.unsubscribe () before close, it is possible that some offset is not committed, the next restart will be repeated consumption
The most common cause of recurring consumption is the re-balance problem, which often results in consuming data, which is time consuming and leads to more than the Kafka session. Timeout time (0.10.x version default is 30 seconds), then will be re-balance the balance, at this time there is a certain chance that offset is not submitted, will lead to re-balance after consumption.
Deduplication problem: Messages can be identified with a unique ID Kafka message guarantees not lost and recurring consumption issues
When using the synchronous mode, there are 3 states to ensure that the message is in safe production, in the configuration of 1 (only to ensure that the write leader success), if just leader partition hang, the data will be lost.
There is also the possibility of losing the message, that is, when using asynchronous mode, when the buffer is full, if configured to 0 (without receiving confirmation, the buffer pool is full, emptying the buffer pool of messages),
The data will be discarded immediately.
Ways to avoid data loss in the production of data:
As long as you can avoid both of these situations, you can guarantee that messages will not be lost.
That is, in synchronous mode, the confirmation mechanism is set to-1, which means that the message is written to leader and all replicas.
Also, in asynchronous mode, if the message is sent out, but no confirmation has been received, the buffer pool is full, in the configuration file is set to unrestricted blocking timeout time, it is said that the production side has been blocked, so that the data will not be lost.
The way to avoid data loss when consuming data: If storm is used, the ackfail mechanism of storm is turned on, and if Storm is not used, the offset value is updated when the data is processed. You need to manually control the offset value in the low-level API.
Data is repeated if the consumption situation is processed.
(1) de-weight: The unique identity of the message is saved to the external media, each time the consumption processing to determine whether the processing;
(2) Regardless: In big data scene, the report system or the log information loses a few to be indifferent, does not affect the final statistical analysis result.