ERROR Log event analysis in kafka broker: kafka. common. NotAssignedReplicaException,

Source: Internet
Author: User

ERROR Log event analysis in kafka broker: kafka. common. NotAssignedReplicaException,

The most critical piece of log information in this error log is as follows, and most similar error content is omitted in the middle.

[2017-12-27 18:26:09,267] ERROR [KafkaApi-2] Error when handling request Name: FetchRequest; Version: 2; CorrelationId: 44771537; ClientId: ReplicaFetcherThread-2-2; ReplicaId: 4; MaxWait: 500 ms; MinBytes: 1 bytes; RequestInfo: [test-topic02-rrt,12] -> PartitionFetchInfo(8085219,1048576),[test-topic01-ursr,22] -> PartitionFetchInfo(0,1048576),[test-topic02-rd,13] -> PartitionFetchInfo(787543,1048576),[test-topic02-st,12] -> PartitionFetchInfo(14804029,1048576),[test-topic04-ursr,7] -> PartitionFetchInfo(8,1048576),[test-topic04-rd,15] -> PartitionFetchInfo(2380092,1048576),[test-topic04-rrt,18] -> PartitionFetchInfo(27246143,1048576),[test-topic03-rrt,12] -> PartitionFetchInfo(12853720,1048576),[test-topic04-st,18] -> PartitionFetchInfo(25335299,1048576),[test-topic03-srt,11] -> PartitionFetchInfo(3750134,1048576),[test-topic05-ursd,17] -> PartitionFetchInfo(0,1048576),[test-topic05-srt,22] -> PartitionFetchInfo(33136074,1048576),[test-topic01-sd,1] -> PartitionFetchInfo(14361,1048576),[test-topic03-rd,21] -> PartitionFetchInfo(96366,1048576),[test-topic04-ursd,10] -> PartitionFetchInfo(0,1048576),[my-working-topic,15] -> PartitionFetchInfo(0,1048576),[test-topic02-ts_st,12] -> PartitionFetchInfo(0,1048576),[test-topic03-ursr,9] -> PartitionFetchInfo(1,1048576) (kafka.server.KafkaApis)kafka.common.NotAssignedReplicaException: Leader 2 failed to record follower 4's position -1 since the replica is not recognized to be one of the assigned replicas  for partition [my-working-topic,15].

We can find at least two key information:

Error when handling request Name: FetchRequest; kafka. common. notAssignedReplicaException: Leader 2 failed to record follower 4's position-1 since the replica is not recognized to be one of the assigned replicas for partition [my-working-topic, 15] 1. Analysis of Error Message 1: Error when handling request Name: FetchRequest, we can see that kafka encountered an Error in processing partition data synchronization. There are two lines of logs above this line, this line indicates that the broker 2 node has stopped the data synchronization thread for four partitions in my-working-topic, namely, 21, 15, 3, and 9.
[2017-12-27 18:26:09,219] INFO [ReplicaFetcherManager on broker 2] Removed fetcher for partitions [my-working-topic,21],[my-working-topic,15],[my-working-topic,3],[my-working-topic,9] (kafka.server.ReplicaFetcherManager)
Here is a knowledge point: the primary function of the ReplicaFetcherManager class is to manage all data synchronization from the leader on the current broker. The main methods include addPartitions and removePartitions, control the start or remove of the leader copy data synchronization thread of the specified partition.
The log contains the following content:
[my-working-topic,15] -> PartitionFetchInfo(0,1048576)
Where PartitionFetchInfoIs defined as: class PartitionFetchInfo (offset: Long, fetchSize: Int), defines the copy synchronization information of the partition, the two parameters are the offset value (that is, the start point) of the copy data synchronization operation of the partition and the obtained data length value. That is to say, this FetchRequest contains a data segment of my-working-topic Partition 15. The data starting position is 0 and the data size is 1048576 (1024*1024) ).
However, we have just analyzed the data synchronization threads of four partitions in my-working-topic 21, 15, 3, and 9 that have just been stopped by the manager, in a Data Synchronization request after 48 ms, synchronization is required for the partitions of the topic. The following log only intercepts the part of partition 15. In fact, the error content of the other three partitions is omitted. Therefore, an error is required for such unreasonable operations. The error message is the 2nd-segment log content that we analyze.
2. kafka. common. notAssignedReplicaException: Leader 2 failed to record follower 4's position-1 since the replica is not recognized to be one of the assigned replicas for partition [my-working-topic, 15] This section indicates that the partition of my-working-topic 15 has a leader replica which is broker 2. Now broker 2 is obtaining and recording a follower replica, that is, the log location information of the broker 4 copy fails, because the broker 4 copy is currently not considered a valid replica copy of partition 15 of my-working-topic.
The following logs are only part of the log Content in one second. In fact, these error logs are short and concentrated. Why is this error reported? What are the effects of these errors on our production business? Start with the topic my-working-topic. This is exactly the error message it raises. The reason is that some features are added in this program version, and a new topic named my-working-topic is created. Kafka is a distributed cluster system with multiple functions on each broker node. The thread is responsible for maintaining the data consistency and integrity of each node, partition, and replica of kafka. It is estimated that when a new topic is created and a large number of partitions and replicas are applied, such as 24 partitions and 3 replicas, the following transient data consistency conflicts will occur. When an error occurs, the kafka broker node suspends the processing of the problematic data, waits for the kafka controller broker to push the correct partition copy for maintenance, and then processes the local log file according to the correct information, start the data synchronization thread for each partition of the topic. Therefore, as long as such errors are not constantly flushed logs, they are only process-oriented intermediate state data and can be observed without any impact.
[2017-12-27 18:26:09,219] INFO [ReplicaFetcherManager on broker 2] Removed fetcher for partitions [my-working-topic,21],[my-working-topic,15],[my-working-topic,3],[my-working-topic,9] (kafka.server.ReplicaFetcherManager)[2017-12-27 18:26:09,248] INFO Completed load of log my-working-topic01 with log end offset 0 (kafka.log.Log)[2017-12-27 18:26:09,267] ERROR [KafkaApi-2] Error when handling request Name: FetchRequest; Version: 2; CorrelationId: 44771537; ClientId: ReplicaFetcherThread-2-2; ReplicaId: 4; MaxWait: 500 ms; MinBytes: 1 bytes; RequestInfo: [test-topic02-rrt,12] -> PartitionFetchInfo(8085219,1048576),[test-topic01-ursr,22] -> PartitionFetchInfo(0,1048576),[test-topic02-rd,13] -> PartitionFetchInfo(787543,1048576),[test-topic02-st,12] -> PartitionFetchInfo(14804029,1048576),[test-topic04-ursr,7] -> PartitionFetchInfo(8,1048576),[test-topic04-rd,15] -> PartitionFetchInfo(2380092,1048576),[test-topic04-rrt,18] -> PartitionFetchInfo(27246143,1048576),[test-topic03-rrt,12] -> PartitionFetchInfo(12853720,1048576),[test-topic04-st,18] -> PartitionFetchInfo(25335299,1048576),[test-topic03-srt,11] -> PartitionFetchInfo(3750134,1048576),[test-topic05-ursd,17] -> PartitionFetchInfo(0,1048576),[test-topic05-srt,22] -> PartitionFetchInfo(33136074,1048576),[test-topic01-sd,1] -> PartitionFetchInfo(14361,1048576),[test-topic03-rd,21] -> PartitionFetchInfo(96366,1048576),[test-topic04-ursd,10] -> PartitionFetchInfo(0,1048576),[my-working-topic,15] -> PartitionFetchInfo(0,1048576),[test-topic02-ts_st,12] -> PartitionFetchInfo(0,1048576),[test-topic03-ursr,9] -> PartitionFetchInfo(1,1048576) (kafka.server.KafkaApis)
kafka.common.NotAssignedReplicaException: Leader 2 failed to record follower 4's position -1 since the replica is not recognized to be one of the assigned replicas  for partition [my-working-topic,15].
    at kafka.cluster.Partition.updateReplicaLogReadResult(Partition.scala:251)    at kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:864)    at kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:861)    at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:221)    at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428)    at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428)    at kafka.server.ReplicaManager.updateFollowerLogReadResults(ReplicaManager.scala:861)    at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:470)    at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:496)    at kafka.server.KafkaApis.handle(KafkaApis.scala:77)    at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)    at java.lang.Thread.run(Thread.java:745).............................................................[2017-12-27 18:26:09,387] INFO Partition [my-working-topic,13] on broker 2: No checkpointed highwatermark is found for partition [my-working-topic,13] (kafka.cluster.Partition)[2017-12-27 18:26:09,388] INFO [ReplicaFetcherManager on broker 2] Removed fetcher for partitions [my-working-topic,5],[my-working-topic,10],[my-working-topic,8],[my-working-topic,13],[my-working-topic,18],[my-working-topic,4],[my-working-topic,19],[my-working-topic,14] (kafka.server.ReplicaFetcherManager)[2017-12-27 18:26:09,388] INFO Truncating log my-working-topic-8 to offset 0. (kafka.log.Log)[2017-12-27 18:26:09,388] INFO Truncating log my-working-topic-5 to offset 0. (kafka.log.Log)[2017-12-27 18:26:09,388] INFO Truncating log my-working-topic-13 to offset 0. (kafka.log.Log)[2017-12-27 18:26:09,388] INFO Truncating log my-working-topic-14 to offset 0. (kafka.log.Log)[2017-12-27 18:26:09,388] INFO Truncating log my-working-topic-18 to offset 0. (kafka.log.Log)[2017-12-27 18:26:09,388] INFO Truncating log my-working-topic-4 to offset 0. (kafka.log.Log)[2017-12-27 18:26:09,389] INFO Truncating log my-working-topic-19 to offset 0. (kafka.log.Log)[2017-12-27 18:26:09,389] INFO Truncating log my-working-topic-10 to offset 0. (kafka.log.Log)[2017-12-27 18:26:09,393] INFO [ReplicaFetcherManager on broker 2] Added fetcher for partitions List([[my-working-topic,8], initOffset 0 to broker BrokerEndPoint(1,172.17.140.91,9092)] , [[my-working-topic,5], initOffset 0 to broker BrokerEndPoint(4,172.17.140.95,9092)] , [[my-working-topic,13], initOffset 0 to broker BrokerEndPoint(6,172.17.140.42,9092)] , [[my-working-topic,14], initOffset 0 to broker BrokerEndPoint(1,172.17.140.91,9092)] , [[my-working-topic,18], initOffset 0 to broker BrokerEndPoint(5,172.17.140.41,9092)] , [[my-working-topic,4], initOffset 0 to broker BrokerEndPoint(3,172.17.140.93,9092)] , [[my-working-topic,19], initOffset 0 to broker BrokerEndPoint(6,172.17.140.42,9092)] , [[my-working-topic,10], initOffset 0 to broker BrokerEndPoint(3,172.17.140.93,9092)] ) (kafka.server.ReplicaFetcherManager)
3. About INFO Partition [my-working-topic, 13] on broker 2: No checkpointed highwatermark is found for partition [my-working-topic, 13] (kafka. cluster. partition)
This line of log mentions the high watermark, which is used to record the log offset position information that the current leader replica has synchronized to other follower replicas and has been flushed to the log file. Why does it mean No checkpointed highwatermark is found? This is because my-working-topic is created. There is no data in the log file. Where is the log offset and highwatermark. Therefore, this log is only at the INFO level. This is a normal event. In the following logs, we can find that my-working-topic-13 is first cleared by truncating of the log file, and then the data synchronization thread added fetcher for partition of the partition is started.
Finally, check your topic status, all of which are normal: $. /kafka-topics.sh -- describe -- zookeeper 172.17.140.91: 2181 -- topic my-working-topic Topic: my-working-topic PartitionCount: 24 ReplicationFactor: 3 Configs: Topic: my-working-topic Partition: 0 Leader: 5 Replicas: 5, 3, 4 Isr: 5, 3, 4 topics: my-working-Topic Partition: 1 Leader: 6 Replicas: 6, 4, 5 Isr: 6, 4 topic: my-working-topic Partition: 2 Leader: 1 Replicas:, 6 Isr:, 6 Topic: my-working-topic Partition: 3 Leader: 2 Replicas, 1 Isr: 2, 6, 1 Topic: my-working-topic Partition: 4 Leader: 3 Replicas: 3, 1, 2 Isr: 3, 1, 2 Topic: my-working-topic Partition: 5 Leader: 4 Replicas: 4, 2, 3 Isr: 4, 2, 3 Topic: my-working-topic Partition: 6 Leader: 5 Replicas: 5, 4, 6 Isr: 5, 4, 6 Topic: my-working-topic Partition: 7 Leader: 6 Replicas: 6, 5, 1 Isr: 6, 5, 1 Topic: my-working-topic Partition: 8 Leader: 1 Replicas, 2 Isr:, 2 Topic: my-working-topic Partition: 9 Leader: 2 Replicas:, 3 Isr:, 3 Topic: my-working-topic Partition: 10 Leader: 3 Replicas: 3, 2, 4 Isr: 3, 2, 4 Topic: my-working-topic Partition: 11 Leader: 4 Replicas: 4, 3, 5 Isr: 4, 3, 5 Topic: my-working-topic Partition: 12 Leader: 5 Replicas: 5, 6, 1 Isr: 5, 6, 1 Topic: my-working-topic Partition: 13 Leader: 6 Replicas: 6, 1, 2 Isr: 6, 1, 2 Topic: my-working-topic Partition: 14 Leader: 1 Replicas: 1, 2, 3 Isr: 1, 2, 3 Topic: my-working-topic Partition: 15 Leader: 2 Replicas: 2, 3, 4 Isr: 2, 3, 4 topics: my-working-Topic Partition: 16 Leader: 3 Replicas: 3, 4, 5 Isr: 3, 4, 5 topic: my-working-topic Partition: 17 Leader: 4 Replicas:, 6 Isr:, 6 Topic: my-working-topic Partition: 18 Leader: 5 Replicas, 2 Isr: 5, 1, 2 Topic: my-working-topic Partition: 19 Leader: 6 Replicas: 6, 2, 3 Isr: 6, 2, 3 Topic: my-working-topic Partition: 20 Leader: 1 Replicas: 1, 3, 4 Isr: 1, 3, 4 topics: my-working-Topic Partition: 21 Leader: 2 Replicas: 2, 4, 5 Isr: 2, 4, 5 topic: my-working-topic Partition: 22 Leader: 3 Replicas: 3, 5, 6 Isr: 3, 5, 6 Topic: my-working-topic Partition: 23 Leader: 4 Replicas: 4, 6, 1 Isr: 4,6, 1

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.