How to determine the number of partitions, keys, and consumer threads for Kafka

Last Update:2017-06-26 Source: Internet

Author: User

Tags benchmark

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Transferred from: HTTP://WWW.TUICOOL.COM/ARTICLES/AJ6FAJ3

How to determine the number of partitions, keys, and consumer threads for Kafka in the QQ group of the Kafak Chinese community, the proportion of the problem mentioned is quite high, which is one of the most frequently encountered problems for Kafka users. This paper, combined with Kafka source code, tries to discuss the related factors of this problem. We hope to help you. How do I determine the number of partitions? "How many partitions should I choose?" "-If you are in a group of Kafka Chinese communities, you will often encounter such problems. Unfortunately, we do not seem to have an authoritative answer to this question. In fact, this is not surprising, after all, such problems are usually not fixed answers. The Kafka online advertises itself as "High-throughput distributed messaging System", a high-throughput distributed messaging engine. So how to achieve high throughput? Kafka at the bottom of the Java heap cache mechanism, the use of operating system-level page caching, while the random write operations into sequential write, combined with the zero-copy features greatly improved IO performance. However, this is only one aspect, after all, the ability of single-machine optimization is capped. How can you further increase throughput by horizontally scaling even linear scaling? Kafka is the use of partitioning (partition), which enables the high throughput of message processing (either producer or consumer) by breaking the topic messages to multiple partitions and distributing them on different brokers.
Both producers and consumers of Kafka can operate in parallel in multiple threads, and each thread is processing a partitioned data. So partitioning is actually the smallest unit of tuning Kafka parallelism. For producer, it actually uses multiple threads to initiate a socket connection to the broker where the different partitions are located and send messages to those partitions at the same time; consumer, All consumer threads in the same consumer group are consumed by a partition of the specified topic (specifically how to determine the number of consumer threads we'll explain later in detail). So, if more than one topic partition, theoretically the entire cluster can achieve the greater throughput.
But is the more partitions the better? Obviously not, because each partition has its own overhead:
First, the client/server side need to use more memory to talk about the client's situation. Kafka 0.8.2 later introduced the Java version of the new producer, this producer has a parameter batch.size, the default is 16KB. It caches messages for each partition and packs the messages in batches once they are full. It looks like it's a design that improves performance. Obviously, because this parameter is at the partition level, if the number of partitions is greater, this portion of the cache will require more memory. Assuming you have 10,000 partitions, by default, this portion of the cache consumes approximately 157MB of memory. And the consumer end? We throw aside the memory needed to get the data, not to mention the thread overhead. If you still have 10,000 partitions, and the number of consumer threads to match the number of partitions (in most cases the optimal consumption throughput configuration), then the consumer client will create 10,000 threads, You also need to create about 10,000 sockets to get the partition data. The overhead of thread switching in this is no longer negligible. Server-side overhead is not small, if you read Kafka source can be found that many components of the server side in memory maintain the partition level of cache, such as Controller,fetchermanager, so the more partitions, the cost of this cache is longer.
Second, the cost of the file handle each partition has its own directory in the underlying file system. There are usually two files in this directory: Base_offset.log and Base_offset.index. The controller and Replicamanager of Kafak will save these two file handles for each broker (filename handler). Obviously, the greater the number of partitions, the more file handles you need to keep open, which may end up breaking your ulimit-n limit.
Third, reduce the high availability of Kafka through the copy (replica) mechanism to ensure high availability. The practice is to save several copies of each partition (Replica_factor specifies the number of replicas). Each copy is saved on a different broker. A copy of the interim is acting as a copy of leader, handling producer and consumer requests. The other replicas act as follower roles, and the Kafka Controller is responsible for ensuring synchronization with the leader. If the broker where the leader is located, Contorller will detect and then re-elect the new leader--with the help of zookeeper, which will have a short, unavailable time window, although in most cases it may be just a few milliseconds. But if you have 10,000 partitions, 10 brokers, which means there are 1000 partitions on average on each broker. Now that the broker is dead, the zookeeper and controller need to leader the 1000 partitions immediately. This must take longer than a very small number of partitioned leader elections, and usually not linearly cumulative. It would be even worse if the broker was also a controller at the same time.
Having said so much "nonsense", many people must have been impatient. How do you determine the number of partitions in the final analysis? The answer is: depending on the situation. Basically you still need to pass a series of experiments and tests to determine. Of course, the test should be based on throughput. Although LinkedIn has done a benchmark test of Kafka, its results do not mean much to you, because different hardware, software, and load test results will inevitably vary. I often encounter problems similar to, the official website said can go to 10MB per second, why my producer per second only 1MB? --and not to mention the hardware conditions, finally found that he used the message body 1KB, and the official website of the benchmark is measured with 100B, so there is no comparability. However, you can still follow certain steps to try to determine the number of partitions: Create a topic with only 1 partitions and test the producer throughput and consumer throughput of this topic. Assuming that their values are TP and TC respectively, the unit can be MB/s. Then assume that the total target throughput is Tt, then the number of partitions = TT/MAX (TP, Tc) Tp represents the throughput of the producer. Testing producer is usually easy, because its logic is very simple, just send the message directly to the Kafka. The TC represents the throughput of the consumer. The test TC is usually more related to the application because the TC's value depends on what you do after you get the message, so TC testing is usually a bit cumbersome.
In addition, Kafka does not really do linear expansion (in fact, no system can), so you should plan your number of partitions when planning, so the future expansion will be more convenient. message-Allocation of partitionsBy default, Kafka allocates the partition based on the key of the message being passed, that is, hash (key)% numpartitions, as shown in:

Partition(Key:any, numpartitions:int): Int = {    utils.abs (key.hashcode)% numpartitions}

This ensures that messages of the same key must be routed to the same partition. If you don't specify a key, then how does Kafka determine which partition the message goes to?

if (key = =NULL) {If no key is specifiedVal id = sendpartitionpertopiccache.get (topic)Let's see if Kafka has a cache of ready-made partition ID IDs.Match {CaseSome (PartitionID) = PartitionIDJust use this partition ID if you have one.Casenone = //if not, val Availablepartitions = Topicpartitionlist.filter (_.leaderbrokeridopt.isdefined) // Find out which broker if (availablepartitions.isempty) new leadernotavailableexception (  "No leader for all partition in topic" + topic) val index = utils.abs (Random.nextInt)% Availablepartitions.size //from randomly pick a val partitionId = Availablepartitions (index). PartitionID sendpartitionpertopiccache.put (topic, PartitionID) //update cache for next direct use PartitionID}}

As can be seen, Kafka almost randomly find a partition to send a non-key message, and then add this area code to the cache for direct use later-of course, Kafka itself will also empty the cache (default every 10 minutes or each time the request topic metadata)

How to set the number of consumer threadsMy personal point of view is that if you have a partition number of n, the best number of threads is also maintained as N, which usually achieves maximum throughput. A configuration that exceeds n is a waste of system resources because the extra threads are not allocated to any partitions. Let's take a look at how the specific Kafka is allocated. A partition under topic can only be consumed by a consumer thread under the same consumer group, but it does not, that is, a consumer thread can consume data from multiple partitions, such as Kafka supplied The default is just one thread that consumes data from all partitions. --in fact, Consoleconsumer can use the wildcard function to consume multiple topic data simultaneously, but this is irrelevant to this article. Before discussing the allocation strategy, let's talk about kafkastream--, which is the key class for consumer, and provides a traversal method for consumer program calls to implement data consumption. The bottom layer maintains a blocking queue, so when no new messages arrive, consumer is in a blocking state, and the state of the show is that the consumer program is waiting for new messages to arrive. -You can of course configure the consumer with timeout, see the use of parameter consumer.timeout.ms. Let's talk about the two allocation policies provided by Kafka: Range and Roundrobin, specified by the parameter partition.assignment.strategy, and the range policy by default. This article discusses only the range policy. The so-called range is actually divided by the stage. As an example, let's say you have 10 partitions, P0 ~ P9,consumer threads are 3, C0 ~ C2, so which partitions are allocated for each thread? C0 consumption partition 0, 1, 2, 3C1 consumption partition 4, 5, 6 C2 consumption partition 7, 8, 9
The specific algorithm is:

Val Npartsperconsumer = curpartitions.size/curconsumers.sizeEach consumer at least the number of partitions that are consumedVal nconsumerswithextrapart = curpartitions.size% curconsumers.sizeHow many partitions are left to be allocated separately to the beginning of the thread ...for (Consumerthreadid <-consumerthreadidset) {// For each consumer thread val myconsumerposition = Curconsumers.indexof (consumerthreadid) //calculates the position of the thread in all threads, between [0, n-1] assert (myconsumerposition >= 0) //Startpart is the number of starting partitions to be consumed by this thread val Startpart = Npartsperconsumer * myconsumerposition + myconsumerposition.min (nconsumerswithextrapart) //Nparts is the total number of partitions that this thread consumes val nparts = Npartsperconsumer + (if (myconsumerposition + 1 > Nconsumerswithextrapart) 0 else 1) ...}

For this example, Npartsperconsumer is 10/3=3,nconsumerswithextrapart for 10%3=1, which shows that each thread guarantees at least 3 partitions, and that the remaining 1 partitions need to be allocated separately to several threads at the beginning. This is why C0 consumes 4 partitions, followed by 2 threads each consuming 3 partitions, the specific process is described in the following debug information:

Ctx.mytopicthreadids

Npartsperconsumer = 10/3 = 3 Nconsumerswithextrapart = 10 3 = 1
First time: myconsumerposition = 1 Startpart = 1 * 3 + min (1, 1) = 4---that is, start reading from partition 4 Nparts = 3 + (if (1 + 1 > 1) 0 else 1) = 3 Read 3 partitions, i.e. 4,5,6 second time: myconsumerposition = 0 Startpart = 3 * 0 + min (1, 0) =0---Start reading from partition 0 Nparts = 3 + (if (0 + 1 > 1) 0 else 1) = 4 reads 4 partitions, i.e. 0,1,2,3 Third: myconsumerposition = 2 Startpart = 3 * 2 + min (2, 1) = 7---read from partition 7 Nparts = 3 + if (2 + 1 > 1) 0 else 1) = 3 reads 3 partitions, 7, 8, 9 so that 10 partitions have been allocated
In this case, there is always a need for a consumer thread to consume the specified partition without consuming other partitions. Frankly speaking, currently Kafka does not provide a custom allocation policy. It's hard to do it, but think about it, maybe we expect Kafka to do too much, after all, it's just a messaging engine, and the logic of adding message consumption to Kafka may not be the Kafka thing to do.

How to determine the number of partitions, keys, and consumer threads for Kafka

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More