How to determine the number of partitions, key, and consumer threads for Kafka

Source: Internet
Author: User
Tags abs benchmark zookeeper
reproduced original:
How to determine the number of partitions, key, and consumer threads for Kafka
In the QQ group of the Kafak Chinese community, the proportion of the problem mentioned is quite high, which is one of the most common problems Kafka users encounter. This article unifies the Kafka source code to attempt to the question related factor to discuss.   Hope to help you.     How to determine the number of partitions. "I should choose several partitions. "-If you are in a group of Kafka Chinese communities, you will often encounter such problems. But unfortunately, we don't seem to have a very authoritative answer to the question. In fact, this is not surprising, after all, such a problem is usually not a fixed answer. The Kafka Web site advertises itself as "High-throughput distributed messaging System", a highly-throughput distributed messaging engine. So how to achieve high throughput. Kafka the Java heap caching mechanism at the bottom, using the operating system level of page caching, while the random write to sequential write, combined with the characteristics of zero-copy greatly improve the IO performance. However, this is only one aspect, after all, the capacity of stand-alone optimization is capped. How to increase throughput further by horizontal scaling or even linear scaling? Kafka uses partitions (partition) to achieve high throughput of message processing (whether producer or consumer) by breaking topic messages to multiple partitions and distributing them on different broker.
Kafka producers and consumers can operate in parallel, and each thread handles a partitioned data. So partitioning is actually the smallest unit of tuning Kafka parallelism. For producer, it is actually sending messages to these partitions concurrently with multiple threads and initiating a socket connection to the broker where the different partitions are located; and consumer, All consumer threads within the same consumer group are consumed by a partition of the specified topic (How to determine the number of consumer threads we will explain later). So, if you have more than one topic partition, the larger the throughput that the entire cluster can achieve in theory.
But the more partitions, the better. Obviously not, because each partition has its own overhead:
One, the client/server side need to use more memory to first say the client. Kafka 0.8.2 After the introduction of the Java version of the new producer, the producer has a parameter batch.size, the default is 16KB. It caches messages for each partition and packs the message in batches once it is full. It seems to be a design that improves performance. But obviously, because this parameter is partition level, if the number of partitions is more, this part of the cache will require more memory footprint. Suppose you have 10,000 partitions, by default, this part of the cache requires about 157MB of memory. and the consumer end. We put aside the memory needed to get the data, not to mention the cost of the thread. If you still have 10,000 partitions, and the number of consumer threads to match the number of partitions (most of which is the optimal consumption throughput configuration), then the consumer client will create 10,000 threads, You also need to create approximately 10,000 sockets to get the partition data.     The overhead of threading the switch itself is no longer negligible. Server-side overhead is also not small, if you read the Kafka source, you can find that many of the server side of the components in memory maintenance of the partition level of cache, such as Controller,fetchermanager, so the more the number of partitions, the cost of this cache the longer the larger.
The cost of the file handle each partition has a directory of its own in the underlying file system. There are usually two files in the directory: Base_offset.log and Base_offset.index. Kafak's controller and Replicamanager will save the two file handles for each broker (file handler). Obviously, the more you have the number of partitions, the more file handles you need to keep open, which may eventually break your ulimit-n limit.
Third, reduce the high availability of Kafka through a copy (replica) mechanism to ensure high availability. The practice is to save several replicas for each partition (Replica_factor specify the number of replicas). Each copy is saved on a different broker. A copy of the interim acts as a leader copy, responsible for handling producer and consumer requests. Other replicas act as follower roles, and Kafka Controller is responsible for ensuring synchronization with leader. If leader's broker hangs up, Contorller detects and then zookeeper with the help of the new leader--, which has a short window of unavailable time, although in most cases it may only be a few milliseconds. But if you have 10,000 partitions, 10 broker, that means there are 1000 partitions per broker on average. Now that the broker has been hung up, then zookeeper and controller need to leader the 1000 divisions immediately. This is bound to take longer, and is often not linearly cumulative, compared to a small number of zoning leader elections. It would be even worse if the broker was controller at the same time.
With so many "nonsense", many people must have been impatient. How do you determine the number of partitions in the final analysis? The answer is: depending on the situation. Basically you still need to go through a series of experiments and tests to determine. Of course, the test should be based on throughput. Although LinkedIn has done a benchmark for Kafka, its results do not mean much to you because different hardware, software, and load tests result in a difference. I often encounter problems similar to the official website that can reach 10MB per second, why my producer per second only 1MB. -without the hardware conditions, finally found that he used the message body has 1KB, and the official website benchmark is 100B measured, so there is no comparability. However, you can still follow certain steps to try to determine the number of partitions: Create a topic with only 1 partitions, and then test the topic producer throughput and consumer throughput. Suppose that their values are TP and TC respectively, and the unit can be MB/s. Then, assuming the total target throughput is Tt, then the partition number = Tt/max (TP, Tc) Tp represents producer throughput. Testing producer is usually easy, because its logic is very simple, is to send the message directly to the Kafka just fine. The TC represents the throughput of the consumer. Test TC is usually more related to the application, because the TC value depends on what you do when you get the message, so the TC test is usually a bit cumbersome.
In addition, Kafka does not really do linear scaling (in fact, no system can), so you plan your partition number is best to plan more, so the future expansion is also more convenient. Message-Partition allocation by default, Kafka allocates the partition according to the key that delivers the message, that is, hash (key)% numpartitions, as shown in the following figure:
def partition (Key:any, numpartitions:int): Int = {
    utils.abs (key.hashcode)% numpartitions

This guarantees that messages of the same key will be routed to the same partition. If you don't specify a key, then how Kafka determines which partition the message goes to.

if (key = = null) {//If no key val id = sendpartitionpertopiccache.get (topic) is specified// First look Kafka there is no cache of ready-made partition ID match {case Some (PartitionID) => PartitionID///If there is a direct use of this score Area ID just fine. Case None =>//If not, Val availablepartitions = Topicpartitionlist.filter (_.leaderbroker idopt.isdefined)//Find the broker if (availablepartitions.isempty) throw new Leaderno for all available partitions leader Tavailableexception ("No leader for all partition in topic" + topic) val index = utils.abs (random.nextint)% A Vailablepartitions.size//Randomly select a Val PartitionID = availablepartitions (index). PartitionID SENDP Artitionpertopiccache.put (topic, PartitionID)//update cache for next direct use of PartitionID}} 

You can see that Kafka is almost randomly looking for a partition to send a message without key, and then add this area code to the cache for immediate use-and, of course, Kafka itself empties the cache (default every 10 minutes or every time topic metadata is requested) How to set the number of consumer threads     My personal opinion, if your partition number is n, then the best number of threads will remain n, which will usually achieve maximum throughput. A configuration that exceeds n is a waste of system resources because the extra threads are not allocated to any partitions. Let's see how the specific Kafka are distributed. A partition under     topic can only be consumed by a consumer thread under the same consumer group, but not on the contrary, that is, a consumer thread can consume data from multiple partitions, For example, the Consoleconsumer provided by Kafka, by default, is just a thread to consume all partitioned data. -in fact, Consoleconsumer can consume multiple topic data using the wildcard feature, but this is not related to this article.     Before discussing the allocation strategy, let's say kafkastream--it is a key class of consumer, providing a traversal method for consumer program calls to implement data consumption. The bottom layer maintains a blocking queue, so when no new message arrives, the consumer is blocked, and the state of the consumer program is waiting for the new message to arrive. You can, of course, be configured to consumer with timeouts, as specified in the use of parameter     The following are the two allocation policies that Kafka provides: range and Roundrobin, specified by the parameter partition.assignment.strategy, and the range policy by default. This article only discusses the range policy. So-called range is actually divided according to the stage average. For example, imagine that you have 10 partitions, P0 ~ P9,consumer thread number is 3, C0 ~ C2, then each thread is allocated which partitions.   C0 consumption zoning 0, 1, 2, 3 C1 consumption partition 4, 5, 6 C2 consumption partition 7, 8, 9
   The specific algorithm is:

val npartsperconsumer = curpartitions.size/curconsumers.size//per consumer minimum number of partitions to be consumed Val nconsumerswithextrapart = curpartitions.size% curconsumers.size//How many partitions are left to be allocated separately to the beginning of the thread ... for (Consumerthreadid &  lt;-consumerthreadidset) {//For each consumer thread val myconsumerposition = Curconsumers.indexof (Consumerthreadid) Calculates the position of the thread in all threads, between [0, N-1] assert (myconsumerposition >= 0)//Startpart is the number of starting partitions that this thread will consume Val Startpart
        = Npartsperconsumer * myconsumerposition + myconsumerposition.min (nconsumerswithextrapart)//NParts is the total number of partitions this thread will consume Val nparts = Npartsperconsumer + (if (myconsumerposition + 1 > Nconsumerswithextrapart) 0 else 1) ...} 

Pin for this example, Npartsperconsumer is 10/3=3,nconsumerswithextrapart as 10%3=1, stating that each thread guarantees at least 3 partitions, and that 1 partitions are left to be allocated separately to several threads at the beginning. This is why C0 consumes 4 partitions, followed by 2 threads per consumption of 3 partitions, as detailed in the following debug screenshot information:   ctx.mytopicthreadids

Npartsperconsumer = 10/3 & nbsp;= 3 Nconsumerswithextrapart = 3 = 1
for the first time: myconsumerposition = 1 Startpart = 1 * 3 + min (1, 1) = 4---from partition 4 start Read Nparts = 3 + (if (1 + 1 > 1) 0 else 1) = 3 read 3 partitions, i.e. 4,5,6 second: myconsumerposition = 0 Startpart = 3 * 0 + min (1, 0 =0  ---Read from partition 0 Nparts = 3 + (if (0 + 1 > 1) 0 else 1) = 4 read 4 partitions, that is 0,1,2,3 third time: myconsumerposition = 2 Startpart = 3 * 2 + min (2, 1) = 7---Start reading from partition 7 Nparts = 3 + if (2 + 1 > 1) 0 else 1) = 3 Read 3 partitions, that is 7, 8, 9 all 10 partitions have been allocated
&nbs P Here, there's always a requirement that I want a consumer thread to consume the specified partition without consuming the other partitions. Frankly, currently Kafka does not provide a custom allocation policy. It's hard to do this, but think about it, maybe we expect Kafka to do too much, after all it's just a message engine, the logic of Kafka in the message may not be Kafka.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.