Answer a few netizens raised the question, not clear can look at a piece of content.
1. How should the deletion policy of Kafka be configured? To improve performance, should I delete the consumed data for 1 hours?
Can be configured according to the size of the disk, as long as the disk is sufficient, completely unnecessary to remove the worry. The throughput of the Kafka is not reduced by the increase in data volume. Because theKafka is completely sequential when reading and writing data, only the offsetis recorded, the time complexity is O(1), I have tested the data on T , completely unaffected. Instead, the data is deleted too quickly and is prone to data loss.
2. The message sent has failed, reaching the specified number of retries how to handle it?
The client can set the number of retries and the retry interval, because the general Kafka is in the form of a cluster, it is not successful to retry all the time, and the common case is that the application and the Kafka cluster are disconnected. In fact, in the process of retrying, if the application hangs, the message is lost, if you want to avoid this situation, you need to persist the message, of course, you can choose to persist and remote persistence, choose local persistence is not very safe, because the application server is now likely to be a virtual machine or a container, Remote persistence is relatively secure. But remote means you need a network, what if it happens that remote persistence fails? To solve this kind of problem, the last lifeline is the diary. This type of problem is not just in MQ , but also in storage, which is common in distributed scenarios, but is often overlooked by developers because of the small probability of happening. This is the reason that the settlement can never be accounted for flat. It is often worthwhile to weigh the handling of such small probability events. Important systems usually have the function of timing check. As a compensation mechanism for small probability events.
3. if the total number of replicas is F, how many copies are allowed to be lost?
A maximum of f-1 copies can be lost, that is, as long as there is a copy. This is, of course, about the broker 's configuration. From the point of view of the server, how to distribute the updated data to the whole system as soon as possible, and reduce the time window to achieve the final consistency, is an important aspect to improve the usability and user experience of the system. For distributed Data Systems:
a) number of copies of n -Data
b) w -Update data is the number of nodes that need to guarantee write completion
c) the number of nodes that need to be read when R -read data
Any distributed system, on the server side, to maintain strong consistency, must conform to w+r>n, that is, assuming that there are altogether 3 nodes, write data when the three nodes are written successfully to return, as long as there is a node to survive, you can ensure that the data is up to date.
4. are Kafka in order?
In the same partition is completely sequential, the producer can set the partitioning policy and can customize the partitioning policy so that it can be based on the business partition. For example, if it is related to the user, it can be partitioned according to the user ID , and all operations of the same user go to the same partition, and the order is reached.
Of course, there is the order is also harmful, there is order means blocking, if the consumption of a message has been failed, the consumption process will be blocked, flexible processing method is to retry to a certain number of times, the message persisted to the remote, skip the message continues to consume. It means losing the order.
Why is Kafka (ii)