How do I choose the number oftopics/partitions in a Kafka cluster?
How to select the number of topics/partitions for a Kafka cluster.
This is a common question asked by many Kafka users. The goal of this post is to explain a few important determining factors andprovide a few simple formulas.
This is a problem that many Kafka users often ask. The purpose of this paper is to introduce some important decision factors related to this problem, and to provide some simple calculation formulas. More partitions leads to higherthroughput
The more partitions can provide higher throughput
The "the" the "the" a topic partition is the the "thing to understand" is the "unit" Ofparallelism in Kafka. On both the producer and the broker side, writes Todifferent partitions can do fully in parallel. So expensive Operationssuch as compression can utilize more hardware. On the consumer Side,kafka always gives a single partition ' s data to one consumer thread. Thus, Thedegree of parallelism in the consumer (within a consumer group) is bounded bythe number of partitions being Consu Med. Therefore, in general, the morepartitions there are in a Kafka cluster, the higher one throughput.
First we need to understand the fact that in Kafka, a single patition is the smallest unit of Kafka parallel operations.
At the producer and broker ends, writing data to each partition can be fully parallel, at which point the throughput of the system can be increased by increasing the utilization of the hardware resources, such as compressing the data.
In the consumer segment, Kafka allows only a single partition of data to be consumed by one consumer thread. Therefore, at the consumer end, the consumer parallelism within each consumer group is entirely dependent on the number of partitions consumed.
To sum up, in general, in a Kafka cluster, the greater the number of partition, the greater the throughput that can be reached.
A rough formula for picking the number of partitionsis based on throughput. You measure the throughout, can achieve on asingle partition for production (call ITP) and consumption (call it C) . Let ' s say your target throughput ist. Then you need to have at least Max (T/P,T/C) partitions. The per-partition throughput that one can achieve on theproducer depends on configurations such as the batching size, comp Ressioncodec, type of acknowledgement, replication factor, etc. However, General,one can produce at 10s of Mb/sec on just a single partition as shown in Thisbenchmark. The consumer throughput is oftenapplication dependent since it corresponds to how fast the consumer logic each Message. So, your really need to measure it.
We can roughly compute the number of partitions for the Kafka cluster through throughput. Assuming that the P,consumer throughput for a single partition,producer end is C and the expected target throughput is T, then the number of partition required for the cluster is at least Max (T/P,T/C). On the producer side, the throughput size of a single partition is affected by the size of the batch, the data compression method, the validation type (synchronous/asynchronous), the replication factor, and other configuration parameters. After testing, at the producer end, the throughput of a single partition is usually around 10mb/s. At the consumer end, the throughput of a single partition depends on the application logic processing speed of each message at the consumer end. Therefore, we need to measure the throughput of the consumer end.
Although it's possible to increase of the number ofpartitions over time, one has to is careful if messages are with Keys. When publishing a keyed message, Kafka deterministically maps the "the" to apartition based on the hash of the key. This provides a guarantee which messages with the same key are alwaysrouted to the same. This guarantee can is important for certainapplications since messages within a partition are always delivered in The consumer. If the number of partitions changes, such a guarantee may nolonger hold. To avoid this situation, a common practice are to over-partition abit. Basically, you determine the number of partitions based on a future targetthroughput, say for one or two years. Initially, you can just have asmall Kafka cluster based on your current throughput. Over time, your can addmore brokers to the cluster and proportionally move a subset of the the existingpartitions to the new BR Okers (which can be done online). This is way, you CankeeP up and the throughput growth without breaking the semantics in theapplication when keys are used.
Although we can add the number of partitions over time, we need to focus on this type of message generated based on key. When producer writes a key based message to Kafka, Kafka determines which specific partition the message needs to be written to by the key's hash value. With such a scenario, Kafka can ensure that data of the same key value can be written to the same partition. This ability of Kafka is extremely important for a subset of applications, such as all messages for the same key, consumer need to be ordered to consume in order of the message. If the number of partition changes, the order guarantees above will no longer exist. To avoid this, the usual solution is to allocate some more partitions to meet future needs. In general, we need to design the number of Kafka partitions based on target throughput for the next 1-2 years.
Initially, we could allocate a smaller number of broker to the Kafka cluster based on current business throughput, and over time we could add more broker to the cluster and then move the appropriate proportion of the partition to the newly added broker online. In this way, we can maintain the scalability of business throughput while satisfying a variety of scenarios, including those based on key messages.
In addition to throughput, there are a few otherfactors this are worth considering when choosing the number of partitions. Asyou would, in some cases, has too many partitions may also have.
In addition to throughput, there are a number of other factors that are worth considering when designing partitions. As we'll see later on, for some scenarios, cluster-owned partitions can have a negative impact. More partitions Requires more OpenFile Handles
More partitions need to open more file handles
Each partition maps to a directory in the file systemin the broker. Within that log directory, there'll is two files (one for Theindex and another to the actual data) per log segment. Currently, in Kafka,each broker opens a file handle of both the index and the data file of Everylog segment. So, the more partitions, the higher this one needs to configurethe open file handle limit in the underlying operating Em. This is mostlyjust a configuration issue. We have seen production Kafka clusters running Withmore-than open file thousand per broker.
In Kafka Broker, each partition is aligned to a directory of the file system. In the Kafka data log file directory, each log data segment is assigned two files, an index file, and a data file. The current version of Kafka, each broker opens an index file handle and a data file handle for each log segment file. Therefore, as the partition increases, the underlying operating system needs to be configured with a higher number of file handle limits. This is more of a configuration issue. We have seen that in the production environment Kafka cluster, the number of file handles opened by each broker exceeds 30,000. More partitions may increaseunavailability
More partitioning can lead to higher availability
Kafka supports Intra-cluster replication, which provides higher availabilityand. A partition can have multiple replicas, each stored on adifferent broker. One of the replicas is designated as the leader and the Restof replicas are. Internally, Kafka manages all those replicasautomatically and makes sure this they are in sync. Both the producer andthe consumer requests to a partition are on the served leader. When Abroker fails, partitions with a leader on that broker become temporarilyunavailable. Kafka would automatically move the leader of those unavailablepartitions to some other replicas to continue serving the CLI ENT requests. Thisprocess is doing by one of the Kafka brokers designated as the controller. Itinvolves reading and writing some metadata for each affected partition. Currently, operations to zookeeper are do serially in Thecontroller.
Kafka can achieve high availability and stability of Kafka cluster through multiple replica replication technology. Each partition will have multiple copies of the data, each of which exists in a different broker. In all copies of the data, one copy of the data is leader and the other copy of the data is follower. Within the Kafka cluster, all copies of the data are managed in an automated manner and ensure that all data replicas remain synchronized. Requests to partition, whether producer or consumer, are processed by the broker where the copy of the data is leader. When broker fails, all partition of the leader data copy in that broker will become temporarily unavailable. Kafka will automatically select a leader in the other copy of the data to receive the client's request. This process is done automatically by Kafka controller node broker, which reads and modifies some of the metadata information of the affected partition from the zookeeper. In the current Kafka version implementation, all operations for zookeeper are done by the Kafka controller (serially).
In the common case-a broker is shut downcleanly, the controller'll proactively move the leaders off the shutting do Wnbroker one at a time. The moving of a single leader takes only a fewmilliseconds. So, from the clients perspective, there are only a small Windows ofunavailability during a clean broker shutdown.
In general, when a broker stops a service in a planned manner, Controller removes all leader on that broker before the service stops. Because a single leader can take approximately a few milliseconds to move, a planned service outage at the customer level only results in a system that is not available in a small window of time. (Note: In the case of a planned outage, each time window of the system will only transfer one leader and the other leader are available.) )
However, when a broker was shut down Uncleanly (e.g.,kill-9), the observed unavailability of could be proportional to the NUM ber ofpartitions. Suppose that's a broker has a total of partitions, each with 2replicas. Roughly, this broker would be the leader for about 1000 partitions. When this broker fails uncleanly, all those 1000 partitions become unavailableat the exactly time. Suppose that it takes 5 ms to elect a new leader fora single partition. It'll take up to 5 seconds to elect the new leader ForAll 1000. So, for some partitions, their observed unavailability canbe 5 seconds plus the time taken to detect the failure.
However, when broker stops the service unplanned (for example, kill-9 mode), the system's unavailable window will be related to the number of partition affected. If there are 2000 partition in a 2-node Kafka cluster, each partition has 2 copies of the data. When one of the broker outages is unplanned, all 1000 partition become unavailable at the same time. Assuming that each partition recovery time is 5ms, then the recovery time of 1000 partition will take 5 seconds. Therefore, in this case, the user will observe that the system has a 5-second unavailable time window.
If One is unlucky, the failed broker may Thecontroller. In this case, the process of the electing the new leaders won ' t startuntil the Controller of the new broker. The controller failoverhappens automatically, but requires the new controller to read some metadatafor every partition M zookeeper during initialization. For example, if Thereare 10,000 partitions in the Kafka cluster and initializing the metadata Fromzookeeper 2 ms per Partition, this can add is seconds to the Unavailabilitywindow.
The more unfortunate scenario occurs when the broker for the outage happens to be the controller node. In this case, the election process for the new leader node does not start until the controller node is restored to the new broker. Error recovery of the controller node will occur automatically, but the new controller node needs to read every partition metadata information from the zookeeper to initialize the data. For example, assuming that a Kafka cluster has 10,000 partition, each partition cost approximately 2ms when the metadata is recovered from zookeeper, the controller recovery will increase the unavailable time window of about 20 seconds.
In general, unclean failures are rare. However, Ifone cares about availability in those rare cases, it's probably better to limitthe number of partitions per bro Ker to two to four thousand and the totalnumber of partitions in the cluster to low tens of thousand.
In general, unplanned downtime events occur infrequently. If system availability does not tolerate these few scenarios, it is best to limit the number of partition per broker to 2,000 to 4,000, and the number of partition in each Kafka cluster is limited to 10,000. More partitions may increase end-to-endlatency
The more partitions may increase the end-to-end latency
The
The End-to-end latency in Kafka was defined by Thetime from when A, the published by the producer Age Isread by the consumer. Kafka only exposes a-a-a consumer after it hasbeen committed, i.e Syncreplicas. So, the time to commit a message can is a significant portion of theend-to-end latency. By default, a Kafka broker only uses a single thread toreplicate data from another broker, for all partitions that share R Eplicas betweenthe two brokers. Our experiments show this replicating 1000 partitions from Onebroker to another can add about Ms latency, which implies That theend-to-end latency are at least Ms. This can be too high for some real-timeapplications.
Kafka End-to-end Latency is defined as the time it takes for producer to post messages to the consumer side to receive messages. That is, the time the consumer receives the message minus the time the producer publishes the message. Kafka the message to the consumer only after the message is submitted. For example, the message is not exposed until all In-sync replica lists have been replicated synchronously. Therefore, the time spent in-sync replica replication will be the most significant part of the Kafka end-to-end latency. By default, when each broker replicates data replicas from other broker nodes, the broker node assigns only one thread to the work, which needs to complete the replication of all partition data for that broker. Experience shows that the time lag of 1000 partition from one broker to another is about 20ms, which means that end-to-end latency is at least 20ms. Such a delay may seem too long for some real-time application requirements.
Note that this issue is alleviated on a largercluster. For example, suppose-there are 1000 partition leaders on a brokerand-there are the other brokers in the same Kafka Ster. Therefore, the added latency due to committing a message would be just a few ms,instead of tens of.
Note that the above problems can be mitigated by increasing the Kafka cluster. For example, there is a difference in latency between placing 1000 partition leader into a broker node and placing it on 10 broker nodes. In a cluster of 10 broker nodes, each broker node is required to process data replication on an average of 100 partitions. At this point, the End-to-end delay will change from the original dozens of milliseconds to just a few milliseconds.
As a rule of thumb, if your care about latency, it ' sprobably a good idea to limit the number of partitions per broker to100 X BX R, where B is the number of brokers in a Kafka cluster Andris the replication factor.
As a matter of experience, if you are concerned about message latency, it is a good idea to limit the number of partition per broker node: For Kafka clusters with B broker nodes and replication factor r, the partition number of the entire Kafka cluster is not better than 100* B*r, that is, the number of leader in a single partition is not more than 100. More partitions could Require morememory in the Client
The more partition means that the client needs more memory
In the most recent 0.8.2 release which we ship withthe confluent Platform-1.0, we have developed a more Efficientjava prod Ucer. One of the nice features of the The new producer is it allowsusers to set a upper bound on the amount of memory of Buffering incomingmessages. Internally, the producer buffers messages per partition. After Enoughdata has been accumulated or enough time has passed, the accumulated messagesare removed from the buffer and s ENT to the broker.
In the latest release of the 0.8.2 version of Kafka, we developed a more efficient Java producer. The new producer has a better feature that allows the user to set the memory size limit for the incoming message storage space. At the internal implementation level, the producer caches messages according to each partition. When the data accumulates to a certain size or enough time, the accumulated messages are removed from the cache and sent to the broker node.
If one increases the number of partitions, Messagewill is accumulated in more partitions in the producer. The aggregate amount Ofmemory used may now exceed the configured memory. When this is happens, Theproducer has to either blocks or drop any new, neither of which isideal. To prevent the from happening, one would need to reconfigure theproducer and larger with a memory size.
If the number of partition increases, the message will accumulate at the producer end by more partition. The amount of memory consumed by many partition may exceed the set content size limit. When this happens, producer must solve the above problem by blocking the message or by losing some new messages, but neither is ideal. To avoid this, we have to reset the Produder memory to a larger size.
As a thumb, to achieve good throughput, oneshould allocate at least a few tens of MB per partition being produced In Theproducer and adjust the total amount of memory if the number of partitionsincreases significantly.
Based on experience, in order to achieve better throughput, we must allocate at least dozens of KB of memory per partition at the producer end, and adjust the amount of memory that can be used when the number of partitions is significantly increased.
A similar issue exists in the consumer as. Theconsumer fetches a batch of messages per partition. The more partitions which Aconsumer consumes, the more memory it needs. However, this are typically only the anissue for consumers that are isn't real time.
Similar things still work for the consumer end. The consumer end consumes a batch of messages per partition from the Kafka each time. The more partitions you consume, the greater the amount of memory you will need. However, the above methods are mainly used in non-real time application scenarios. Summary
Summarize
In general, further partitions in a Kafka cluster leadsto higher throughput. However, one does have to be aware of the potentialimpact of have too many partitions in total or per broker on things l Ikeavailability and latency. In the future, we does plan to improve some of thoselimitations to make Kafka more scalable in terms of the number of Partit ions.
In general, the more partition the Kafka cluster brings, the higher the throughput. However, we must be aware that the partition of the cluster is too large or the partition of a single broker node can have a potential impact on system availability and message latency. In the future, we plan to make some improvements to these limitations to make Kafka more scalable in terms of the number of partitions.