Thanks for the original English: https://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/
This is a frequently asked question for many Kafka users. The purpose of this article is to explain several important determinants and to provide some simple formulas. more partitions provide higher throughput the first thing to understand is that the subject partition is the unit of parallelism in Kafka. On the producer and proxy side, writes to different partitions can be completed in full parallel. Therefore, expensive operations, such as compression, can take advantage of more hardware resources. On the consumer side, Kafka always provides a partitioned data to a consumer thread. Therefore, the degree of parallelism of the consumer (within the consumer group) is limited by the number of partitions consumed. Therefore, in general, the more partitions in a Kafka cluster, the higher the throughput.
The rough formula for selecting the number of partitions is based on throughput. You measure the whole, you can produce a single partition (called p) and a consumption implementation (called C). For example, your target throughput ton. Then, you need to have at least the max (T/P,T/C) partition. The throughput per partition that people can achieve on the producer depends on the configuration such as batch size, compression codec, acknowledgment type, replication factor, and so on. However, in general, the base can be in the 10-second MB/as shown here as a single partition. Consumer throughput is usually related to the application because it corresponds to how the consumer logic can process each message quickly. So, you really need to measure it.
Although you can increase the number of partitions over time, you must be careful if you use keys to generate messages. When a message with a key is published, Kafka maps the message to the partition in a deterministic manner based on the hash of the key. This provides a guarantee that messages with the same key are always routed to the same partition. This guarantee may be important for some applications, because messages within a partition are always delivered sequentially to the consumer. If the number of partitions changes, such a guarantee may no longer be maintained. To avoid this situation, a common practice is to over-partition. Basically, you can determine the number of partitions based on future target throughput, such as one or two years later. Initially, you can have only a small Kafka cluster based on your current throughput. Over time, you can add more proxies to your cluster and proportionally move part of an existing partition to a new agent (which can be done online). This way, when you use a key, you can keep up with throughput growth without breaking the semantics in your application.
In addition to throughput, there are other factors that are worth considering when selecting the number of partitions. As you will see, in some cases, too many partitions can have a negative impact.
more partitions need to open more file handles
Each partition is mapped to a directory in the file system in the broker. In this log directory, each log segment will have two files (one for the index and the other for the actual data). Currently, in Kafka, each broker opens the index of each log segment and the file handle of the data file. Therefore, the more partitions, the more partitions in the underlying operating system that need to be configured to open file handle limits. This is mostly a configuration issue. We have seen that the number of file handles opened by each broker running is more than 30000 in the production Kafka cluster.
more partitions may be unavailable for increased Kafka supports intra-cluster replication, which provides higher availability and durability. A partition can have multiple replicas, each of which is stored on a different proxy. One of the replicas is designated as a leader, and the rest of the replicas are followers. Internally, Kafka automatically manages all of these replicas and ensures that they remain in sync. Both producer and consumer requests for zoning are provided on the leader's copy. When the agent fails, the partition with leader on the agent becomes temporarily unavailable. Kafka will automatically move the leaders of those unavailable partitions to a few other replicas to continue the service client request. This process is done by one of the Kafka agents designated as the controller. It involves reading and writing some metadata for each affected partition in zookeeper. Currently, the zookeeper operation is done serially in the controller.
Under normal circumstances, when the agent is cleanly shut down, the controller will proactively turn the leader off and off the agent one at a time. The movement of a single leader takes only a few milliseconds. Therefore, from the customer's point of view, only a small window is unavailable during a clean agent shutdown.
However, when an agent shuts down cleanly (for example, kill-9), the observed unavailability may be proportional to the number of partitions. Assume that the agent has a total of 2000 partitions, each with 2 replicas. Roughly speaking, this agent will be the leader of about 1000 partitions. When this agent fails cleanly, all 1000 partitions cannot be used at all. Assume that it takes 5 milliseconds to select a new leader for a single partition. For all 1000 partitions, selecting a new leader will take 5 seconds. Therefore, for some partitions, the observed unavailability of them can be 5 seconds plus the time it takes to detect the failure.
If one is unfortunate, the failed proxy may be the controller. In this case, the process of electing a new leader will not start until the controller fails over to the new agent. Controller failover occurs automatically, but requires a new controller to read some metadata for each partition from zookeeper during initialization. For example, if you have 10,000 partitions in the Kafka cluster and you need 2ms for each partition to initialize the metadata from zookeeper, you can add more than 20 seconds to the unavailability window.
In general, unclean faults are rare. However, if you care about availability in this rare case, it might be better to limit the number of partitions per agent to two to 4,000, and the total number of partitions in the cluster to tens of thousands of.
more partitions may increase end-to-end latency The end-to-end delay in Kafka is defined by the time the message is published from the producer to the consumer. Kafka exposes a message to the consumer only after it is committed, that is, when the message is copied to all synchronous copies. Therefore, the time to commit the message can be a significant part of the end-to-end wait time. By default, for all partitions that share replicas between two agents, the Kafka agent uses only a single thread to replicate data from another agent. Our experiments show that copying 1000 partitions from one agent to another can add a delay of approximately 20 milliseconds, which means that the end-to-end latency is at least 20 milliseconds. This may be too high for some real-time applications.
Note that this issue is mitigated on larger clusters. For example, assume that there are 1000 partitions booting on the agent and there are 10 other proxies in the same Kafka cluster. Each of the remaining 10 agents only needs an average of 100 partitions from the first agent. Therefore, the increased latency due to the submission of messages will be only a few milliseconds, not dozens of milliseconds.
As a rule of thumb, if you care about the delay, it may be a good idea to limit the number of partitions per broker to the size of * b * R, where B is the number of brokers in the Kafka cluster and R is the replication factor.
more partitions may require more memory on the client in the latest 0.8.2 version, we converge to our platform 1.0, we have developed a more efficient Java manufacturer. A good feature of the new producer is that it allows the user to set an upper limit on the amount of memory used to buffer incoming messages. Internally, the producer buffers messages for each partition. After sufficient data has been accumulated or enough time has elapsed, the accumulated messages are removed from the buffer and sent to the agent.
If you increase the number of partitions, messages accumulate in more partitions in the producer. The total amount of memory used may now exceed the configured memory limit. When this happens, the producer must block or discard any new messages, neither of which is ideal. To prevent this from happening, you need to reconfigure the producer with a large memory size.
As a rule of thumb, in order to achieve good throughput, you should allocate at least dozens of KB per partition in the generator, and adjust the total amount of memory if the number of partitions increases significantly.
There are similar problems for consumers. Consumers get a batch of messages per partition. The more partitions consumers consume, the more memory they need. However, this is usually just not a real-time consumer problem.
Summary
In general, more partitions in the Kafka cluster result in higher throughput. However, one has to be aware of the potential impact of too many partitions on availability and wait times in general or in each agent. In the future, we plan to improve some of the limitations to make Kafka more scalable in terms of partitions.