How to choose the number of topics/partitions in a Kafka cluster?

Source: Internet
Author: User

This was a common question asked by many Kafka users. The goal of this post are to explain a few important determining factors and provide a few simple formulas.

More partitions leads to higher throughput

The first thing to understand are that a topic partition are the unit of parallelism in Kafka. On both the producer and the broker side, writes to different partitions can be do fully in parallel. So expensive operations such as compression can utilize more hardware resources. On the consumer side, Kafka all gives a single partition's data to one consumer thread. Thus, the degree of parallelism in the consumer (within a consumer group) was bounded by the number of partitions being con Sumed. Therefore, in general, the more partitions there is in a Kafka cluster, the higher the throughput one can achieve.

A rough formula for picking the number of partitions are based on throughput. You measure the throughout so can achieve on a single partition for production (call it  p ) and Consump tion (call it  c ). Let ' s say your target throughput is  T . Then you need to has at least  Max (t/p, t/c) partitions. The per-partition throughput so one can achieve on the producer depends on configurations such as the batching size, com Pression codec, type of acknowledgement, replication factor, etc. However, in general, one can produce at 10s of Mb/sec on just a single partition as shown in This benchmark. The consumer throughput is often application dependent since it corresponds to how fast the consumer logic can process EAC H message. So, you really need to measure it.

Although it ' s possible to increase the number of partitions over time, one have to be careful if messages is produced with Keys. When publishing a keyed message, Kafka deterministically maps the message to a partition based on the hash of the key. This provides a guarantee, messages with the same key is always routed to the same partition. This guarantee can is important for certain applications since messages within a partition is always delivered in order t o the consumer. If the number of partitions changes, such a guarantee may no longer hold. To avoid the situation, a common practice is to over-partition a bit. Basically, determine the number of partitions based on a future target throughput, say for one or both years later. Initially, you can just has a small Kafka cluster based on your current throughput. Over time, you can add more brokers to the cluster and proportionally move a subset of the existing partitions to the new Brokers (which can is done online). This by, you can keep up with the throughput growth without breaking the semantics of the application when keys is used.

In addition to throughput, there is a few other factors that is worth considering when choosing the number of partitions . As you'll see, in some cases, has too many partitions may also has negative impact.

More partitions Requires more Open File Handles

Each of the partition maps to a directory of the file system in the broker. Within that log directory, there would be to be in both files (one for the index and another for the actual data) per log segment. Currently, in Kafka, each broker opens a file handle of both the index and the data file of every log segment. So, the most partitions, the higher that one needs to configure the open file handle the the underlying operating SYS Tem. This is mostly just a configuration issue. We have seen production Kafka clusters running with more than and thousand open file handles per broker.

More partitions may increase unavailability

Kafka supports intra-cluster replication, which provides higher availability and durability. A partition can has multiple replicas, each stored on a different broker. One of the replicas is designated as the leader and the rest of the replicas are followers. Internally, Kafka manages all those replicas automatically and makes sure that they is kept in sync. Both the producer and the consumer requests to a partition is served on the leader replica. When a broker is fails, partitions with a leader on the that broker become temporarily unavailable. Kafka would automatically move the leader of those unavailable partitions to some other replicas to continue serving the CL Ient requests. This process was done by one of the Kafka brokers designated as the controller. It involves reading and writing some metadata for each affected partition in ZooKeeper. Currently, operations to ZooKeeper is done serially in the controller.

In the common case when a broker was shut down cleanly, the controller would proactively move the leaders off the Shuttin G down broker one at a time. The moving of a leader takes only a few milliseconds. So, from the clients perspective, there are only a small window of unavailability during a clean broker shutdown.

However, when a broker was shut down uncleanly (e.g., kill-9), the observed unavailability could was proportional to the Number of partitions. Suppose that a broker have a total of $ partitions, each with 2 replicas. Roughly, this broker would be is the leader for about partitions. When this broker is fails uncleanly, all those partitions become unavailable at exactly the same time. Suppose that it takes 5 ms to elect a new leader for a single partition. It'll take up to 5 seconds to elect the new leader for all the partitions. So, for some partitions, their observed unavailability can is 5 seconds plus the time taken to detect the failure.

If One is unlucky, the failed broker could be the controller. In this case, the process of electing the new leaders won ' t start until the controller fails a new broker. The controller failover happens automatically, but requires the new controller to read some metadata for every partition F Rom ZooKeeper during initialization. For example, if there is partitions in the Kafka cluster and initializing the metadata from ZooKeeper takes 2 ms P Er partition, this can add over seconds to the unavailability window.

In general, unclean failures is rare. However, if one cares about availability in those rare cases, it's probably better to limit the number of partitions per b Roker to-four thousand and the total number of partitions in the cluster to low tens of thousand.

More partitions may increase end-to-end Latency

The End-to-end latency in Kafka was defined by the time from when a message was published by the producer to when the MES Sage is read by the consumer. Kafka only exposes a message to a consumer after it had been committed, i.e. when the message was replicated to all the -sync replicas. So, the time to commit a message can be a significant portion of the end-to-end latency. By default, a Kafka broker is uses a single thread to replicate data from the another broker, for all partitions that share Replicas between the brokers. Our experiments show replicating-partitions from one broker to another can-add about Ms latency, which Implie s that the end-to-end latency are at least Ms. This can be too high for some real-time applications.

Note that this issue are alleviated on a larger cluster. For example, suppose that there is partition leaders on a broker and there is ten other brokers in the same KAFKA CL Uster. Each of the remaining brokers only needs to fetch the partitions from the first broker on average. Therefore, the added latency due to committing a message would be just a few ms, instead of tens of Ms.

As a rule of thumb, if you are about latency, it's probably a good idea-to-limit the number of partitions per broker to c0>100 x B x R, where b is the number of brokers in a Kafka cluster and R are the replication factor .

More partitions could Require more Memory in the Client

The most recent 0.8.2 release which we ship with the Confluent Platform 1.0, we had developed a more efficient Java PR Oducer. One of the nice features of the new producer are that it allows users to set a upper bound on the amount of memory used fo R buffering incoming messages. Internally, the producer buffers messages per partition. After enough data have been accumulated or enough time has passed, the accumulated messages is removed from the buffer and Sent to the broker.

If one increases the number of partitions, message would be is accumulated in + partitions in the producer. The aggregate amount of memory used may now exceed the configured memory limit. When this happens, the producer have to either block or drop any new message, neither of which is ideal. To prevent this from happening, one would need to reconfigure the producer with a larger memory size.

As a rule of thumb, to achieve good throughput, one should allocate at least a few tens of KB per partition being produced The producer and adjust the total amount of memory if the number of partitions increases significantly.

A similar issue exists in the consumer as well. The consumer fetches a batch of messages per partition. The more partitions this a consumer consumes, the more memory it needs. However, this was typically only a issue for consumers that was not real time.

Summary

In general, more partitions in a Kafka cluster leads to higher throughput. However, one does has to be aware of the potential impact of the has too many partitions in total or per broker on things Like availability and latency. In the future, we did plan to improve some of those limitations to make Kafka more scalable in terms of the number of parti tions.

This is a question that many Kafka users often ask. The purpose of this paper is to introduce some important decision factors related to this problem, and to provide some simple calculation formulas.

More partitions can provide higher throughput

First we need to understand the fact that in Kafka, a single patition is the smallest unit of Kafka parallel operation. On the prod ucer and broker side, writing data to each partition can be fully parallelized, at which point the throughput of the system can be increased by increasing the utilization of the hardware resources, such as compressing the data. In the consumer segment, Kafka only allows a single partition

The data is consumed by a consumer thread. Therefore, at the consumer end, the Consu mer parallelism within each consumer group is entirely dependent on the number of partitions consumed. In summary, typically, in a Kafka cluster, the greater the number of partiti on, the greater the throughput that can be reached.

We can roughly calculate the number of partitions for a Kafka cluster through throughput. Assuming that the throughput at the P,consumer end of a single partition,producer end is C and the desired target throughput is T, then the number of partition required by the cluster is at least Max (T/P,T/C). On the producer side, the throughput size of a single partition is affected by the configuration parameters such as batch size, data compression method, acknowledgment type (synchronous/asynchronous), replication factor, and so on. After testing, at the producer end, the throughput of a single partition is usually around 10mb/s. On the consumer side, the throughput of a single partition relies on the application logic processing speed of each message in con Sumer. Therefore, we need to measure the throughput at the consumer end.

While we are able to add the number of partitions over time, we need to focus on this type of message generated based on key. When producer writes a key-based message to Kafka, Kafka uses the hash value of the key to determine which specific partition the message needs to be written to. With such a scheme, Kafka can ensure that data of the same key value can be written to the same partition. This ability of Kafka is extremely important for a subset of applications, such as all messages for the same key, consumer need to be ordered in order of messages. If the number of partition changes, then the order of the above guarantee will no longer exist. To avoid this, the usual solution is to allocate more partitions to meet future needs. Typically, we need to design the number of partitions for Kafka based on the target throughput for the next 1-2 years.

At the outset, we could allocate a smaller number of brokers to the Kafka cluster based on the current business throughput, and over time we could add more brokers to the cluster and then move the appropriate proportional partition to the newly added B Roker in the online way. In this way, we can maintain the scalability of business throughput while satisfying a variety of scenarios, including those based on key messages.

In addition to throughput, there are a number of other factors that are worth considering when designing partitions. As we will see later, for some scenarios, the cluster-owned partitions will have a negative impact.

More partitions need to open more file handles

In Kafka's broker, each partition is in the same directory as the file system. In the Kafka data log file directory, each log data segment is assigned two files, an index file, and a data file. For the current version of Kafka, each broker opens an index file handle and a data file handle for each log segment file. Therefore, with the increase in partition, the underlying operating system is required to configure a higher limit on the number of file handles. This is more of a configuration issue. We have seen that in a production environment Kafka cluster, the number of file handles opened by each broker exceeds 30,000.

More partitioning leads to higher unavailability

Kafka enables high availability and stability of Kafka clusters through multi-replica replication technology. Each partition will have multiple copies of the data, each of which exists in a different broker. In all copies of the data, there is a copy of the data as leader, and the other copies of the data are follower. Within the Kafka cluster, all copies of the data are managed in an automated manner and ensure that all data copies are kept in a synchronized state. Requests to partition, either producer or consumer, are processed by the broker that contains the copy of the leader data. When a broker fails, all partition for the leader data copy in the broker will become temporarily unavailable. Kafka will automatically select a leader in the other copy of the data to receive requests from the client. This process is done automatically by the Kafka controller node broker, mainly reading and modifying some of the metadata information from the affected partition from Z ookeeper. In the current Kafka version implementation, all operations for Zooke Eper are done by the Kafka controller (serially Way).

Typically, when a broker stops a service in a planned manner, the controller removes all leader on the broker before the service is stopped. Since the movement time of a single leader takes about a few milliseconds, a planned service outage from the customer level will only cause the system to be unavailable in a small window of time. (Note: In the case of a planned outage, the system will only transfer one leader per time window, and the other leader are available.) )

However, when the broker stops the service in an unplanned manner (for example, kill-9), the system's unavailable time window will be related to the number of partition affected. If there are 2000 partition in a 2-node Kafka cluster, each partition

Has 2 copies of the data. When one broker is unplanned, all 1000 partition become unavailable at the same time. Assuming that each partition recovery time is 5ms, the recovery time for 1000 partition will take 5 seconds. Therefore, in this case, the user will observe that the system has a 5-second unavailable time window.

The more unfortunate scenario occurs when the broker of the outage happens to be the controller node. In this case, the election process for the new leader node will not start until the controller node is restored to the new broker. The controller node's error recovery will occur automatically, but the new controller node will need to read each partition metadata information from zookeeper for initializing the data. For example, suppose a Kafka cluster has 10,000 partition, and each par tition takes approximately 2ms to recover metadata from zookeeper, the controller's recovery will increase by about 20 seconds of the unavailable time window.

Typically, unplanned downtime events occur in very few cases. If system availability does not tolerate these rare scenarios, it is best to limit the number of partition per broker to 2,000 to 4,000, and the number of parti tion in each Kafka cluster is limited to less than 10,000.

More partitions may increase end-to-end latency

Kafka end-to-end latency is defined as the time required for producer to publish messages to the consumer side to receive messages. That is, the time cons Umer receives the message minus the time producer publishes the message. Kafka messages are only exposed to consumers after the message is submitted. For example, the message is not exposed until all In-sync replica list synchronization is complete. Therefore, the time taken to In-sync copy replication will be the most significant part of the Kafka end-to-end delay. By default, when each broker replicates data copies from other broker nodes, the broker node allocates only one thread for this work, which needs to complete the replication of all par tition data for that broker. Experience shows that the time delay of 1000 partition from one broker to another is approximately 20ms, which means that the end-to-end delay is at least 20ms. This delay is too long for some real-time application requirements.

Note that these problems can be mitigated by increasing the Kafka cluster. For example, placing 1000 partition leader on a BR oker node and putting it into 10 broker nodes, there is a difference in latency between the two. In a cluster of 10 broker nodes, each broker node needs to process data replication for 100 partitions on average. At this point, the end-to-end delay will change from the original dozens of milliseconds to just a few milliseconds.

Based on experience, if you are very concerned about message latency issues, limiting the number of partition per broker node is a good idea: for a Kafka cluster with a B broker node and a replication factor of R, the partition number of the entire Kafka cluster is not better than 100* B*r, that is, the number of leader for a single partition does not exceed 100.

the more Partition means that the client needs more memory

In the latest release of the 0.8.2 version of Kafka, we developed a more efficient Java producer. The new Produc er has a good feature that allows the user to set the maximum memory size for the message storage space to be accessed. At the internal implementation level, producer caches messages according to each partition. When the data accumulates to a certain size or enough time, the accumulated messages will be removed from the cache and sent to the broker node.

If the number of partition increases, the message will accumulate on the producer side by more partition. The memory consumed by the numerous partit ion pools may exceed the set content size limit. When this happens, producer must solve the problem by blocking the message or losing some new information, but neither of these approaches is ideal. To prevent this from happening, we have to re-set the Produder memory to a larger size.

Based on experience, in order to achieve better throughput, we must allocate at least dozens of KB of memory per partition on the producer side, and adjust the amount of memory that can be used when the number of partitions increases significantly.

Similar things still work for the consumer side. The consumer end consumes a batch of messages per partition from Kafka. The greater the number of partitions consumed, the greater the amount of memory required. However, the above methods are mainly used in non-real-time application scenarios.

Summarize

Typically, the more partition in a Kafka cluster brings higher throughput. However, we must realize that the partition of the cluster is too large or the single broker node partition too much, which will have a potential impact on the availability of the system and the message latency. In the future, we plan to make some improvements to these limits, making Kafka more extensible in terms of the number of partitions.

English Original: http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/

How to choose the number of topics/partitions in a Kafka cluster?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.