Article Source: http://www.cnblogs.com/huxi2b/p/4583249. HTML-----------------------------------------------------------------------------------------in the QQ group of Kafak Chinese community, The proportion of the problem mentioned is quite high, which is one of the most frequently encountered problems for Kafka users. This paper, combined with Kafka source code, tries to discuss the related factors of this problem. We hope to help you.
How do I determine the number of partitions? "How many partitions should I choose?" "-If you are in a group of Kafka Chinese communities, you will often encounter such problems. Unfortunately, we do not seem to have an authoritative answer to this question. In fact, this is not surprising, after all, such problems are usually not fixed answers. The Kafka online advertises itself as "High-throughput distributed messaging System", a high-throughput distributed messaging engine. So how to achieve high throughput? Kafka at the bottom of the Java heap cache mechanism, the use of operating system-level page caching, while the random write operations into sequential write, combined with the zero-copy features greatly improved IO performance. However, this is only one aspect, after all, the ability of single-machine optimization is capped. How can you further increase throughput by horizontally scaling even linear scaling? kafka is the use of partitioning (partition), which enables the high throughput of message processing (either producer or consumer) by breaking the topic messages to multiple partitions and distributing them on different brokers. Both producers and consumers of   KAFKA can operate in parallel in multiple threads, and each thread is processing a partitioned data. So partitioning is actually the smallest unit of tuning Kafka parallelism. For producer, it actually uses multiple threads to initiate a socket connection to the broker where the different partitions are located and send messages to those partitions at the same time; consumer, All consumer threads in the same consumer group are consumed by a partition of the specified topic (specifically how to determine the number of consumer threads we'll explain later in detail). So, if more than one topic partition, theoretically the entire cluster can achieve the greater throughput. But is the number of partitions as good as possible? Obviously not, because each partition has its own overhead: one, the more memory that the client/server needs to use the client-side scenario first. Kafka 0.8.2 later introduced the Java version of the new producer, this producer has a parameter batch.size, the default is 16KB. It caches messages for each partition and packs the messages in batches once they are full. It looks like it's a design that improves performance. Obviously, because this parameter is at the partition level, if the number of partitions is greater, this portion of the cache will require more memory. Assuming you have 10,000 partitions, by default, this portion of the cache consumes approximately 157MB of memory. And the consumer end? We've thrown awayThe memory required to fetch the data is not said, only the thread overhead. If you still have 10,000 partitions, and the number of consumer threads to match the number of partitions (in most cases the optimal consumption throughput configuration), then the consumer client will create 10,000 threads, You also need to create about 10,000 sockets to get the partition data. The overhead of thread switching in this is no longer negligible. Server side of the cost is not small, if you read Kafka source code can be found, the server side of many components are in memory maintain the partition level of cache, such as Controller,fetchermanager, so the more partitions, The cost of such a cache is longer and larger. Second, the file handle overhead each partition has its own directory in the underlying file system. There are usually two files in this directory: Base_offset.log and Base_offset.index. The controller and Replicamanager of Kafak will save these two file handles for each broker (filename handler). Obviously, the greater the number of partitions, the more file handles you need to keep open, which may end up breaking your ulimit-n limit. Third, reduce the high availability Kafka through the copy (replica) mechanism to ensure high availability. The practice is to save several copies of each partition (Replica_factor specifies the number of replicas). Each copy is saved on a different broker. A copy of the interim is acting as a copy of leader, handling producer and consumer requests. The other replicas act as follower roles, and the Kafka Controller is responsible for ensuring synchronization with the leader. If the broker where the leader is located, Contorller will detect and then re-elect the new leader--with the help of zookeeper, which will have a short, unavailable time window, although in most cases it may be just a few milliseconds. But if you have 10,000 partitions, 10 brokers, which means there are 1000 partitions on average on each broker. Now that the broker is dead, the zookeeper and controller need to leader the 1000 partitions immediately. This must take longer than a very small number of partitioned leader elections, and usually not linearly cumulative. It would be even worse if the broker was also a controller at the same time. Having said so much "nonsense", many people must have been impatient. So how do you determine the number of partitions?It? The answer is: depending on the situation. Basically you still need to pass a series of experiments and tests to determine. Of course, the test should be based on throughput. Although LinkedIn has done a benchmark test of Kafka, its results do not mean much to you, because different hardware, software, and load test results will inevitably vary. I often encounter problems similar to, the official website said can go to 10MB per second, why my producer per second only 1MB? --and not to mention the hardware conditions, finally found that he used the message body 1KB, and the official website of the benchmark is measured with 100B, so there is no comparability. However, you can still follow certain steps to try to determine the number of partitions: Create a topic with only 1 partitions and test the producer throughput and consumer throughput of this topic. Assuming that their values are TP and TC respectively, the unit can be MB/s. Then assume that the total target throughput is Tt, then the number of partitions =  TT/MAX (TP, TC) TP represents the throughput of the producer. Testing producer is usually easy, because its logic is very simple, just send the message directly to the Kafka. The TC represents the throughput of the consumer. The test TC is usually more related to the application because the TC's value depends on what you do after you get the message, so TC testing is usually a bit cumbersome. In addition, Kafka does not really do linear expansion (in fact, no system can), so you should plan your number of partitions when planning, so the future expansion will be more convenient.
message-Allocation of partitionsBy default, Kafka allocates the partition based on the key of the message being passed, that is, hash (key)% numpartitions, as shown in:
def partition (Key:any, numpartitions:int): Int = { utils.abs (key.hashcode)% numpartitions}
This ensures that messages of the same key must be routed to the same partition. If you don't specify a key, then how does Kafka determine which partition the message goes to?
if (key = = null) { //If no key is specified val id = sendpartitionpertopiccache.get (topic) ///First Look at Kafka there is no cache of ready-made partition ID ID Match {case Some (PartitionID) = PartitionID //If any, use this partition ID directly if you have one If not, val availablepartitions = Topicpartitionlist.filter (_.leaderbrokeridopt.isdefined) // Find out which broker if (availablepartitions.isempty) throw new Leadernotavailableexception ("No leader) where all available partitions are located. Leader for any partition in topic "+ topic) val index = utils.abs (random.nextint)% availablepartitions.size // Randomly pick a val PartitionID = availablepartitions (index). PartitionID sendpartitionpertopiccache.put (topic, PartitionID)//update cache for next direct use PartitionID } }
As can be seen, Kafka almost randomly find a partition to send a non-key message, and then add this area code to the cache for direct use later-of course, Kafka itself will also empty the cache (default every 10 minutes or each time the request topic metadata)
How to set the number of consumer threadsMy personal point of view is that if you have a partition number of n, the best number of threads is also maintained as N, which usually achieves maximum throughput. A configuration that exceeds n is a waste of system resources because the extra threads are not allocated to any partitions. Let's take a look at how the specific Kafka is allocated. A partition under topic can only be consumed by a consumer thread under the same consumer group, but it does not, that is, a consumer thread can consume data from multiple partitions, such as Kafka supplied The default is just one thread that consumes data from all partitions. --in fact, Consoleconsumer can use the wildcard function to consume multiple topic data simultaneously, but this is irrelevant to this article. Before discussing the allocation strategy, let's talk about kafkastream--, which is the key class for consumer, and provides a traversal method for consumer program calls to implement data consumption. The bottom layer maintains a blocking queue, so when no new messages arrive, consumer is in a blocking state, and the state of the show is that the consumer program is waiting for new messages to arrive. -You can of course configure the consumer with timeout, see the use of parameter consumer.timeout.ms. Let's talk about the two allocation policies provided by Kafka: Range and Roundrobin, specified by the parameter partition.assignment.strategy, and the range policy by default. This article discusses only the range policy. The so-called range is actually divided by the stage. As an example, let's say you have 10 partitions, P0 ~ P9,consumer threads are 3, C0 ~ C2, so which partitions are allocated for each thread? C0 consumption partition 0, 1, 2, 3C1 consumption partition 4, 5, 6C2 consumption partition 7, 8, 9 The specific algorithm is:
Val Npartsperconsumer = curpartitions.size/curconsumers.size//per consumer at least guaranteed consumption of the number of partitions val Nconsumerswithextrapart = Curpartitions.size% curconsumers.size//How many partitions are left to be allocated separately to the beginning of the thread ... for (Consumerthreadid <-consumerthreadidset) { // for each consumer thread val myconsumerposition = Curconsumers.indexof (consumerthreadid) //calculates the thread's position in all threads, Between [0, N-1] assert (myconsumerposition >= 0)//Startpart is the number of starting partitions the thread will consume val startpart = Npartsperconsumer * Myconsumerposition + myconsumerposition.min (nconsumerswithextrapart)//Nparts is the total number of partitions to be consumed by this thread val nparts = Npartsperconsumer + (if (myconsumerposition + 1 > Nconsumerswithextrapart) 0 else 1) ...}
For this example, Npartsperconsumer is 10/3=3,nconsumerswithextrapart for 10%3=1, which shows that each thread guarantees at least 3 partitions, and that the remaining 1 partitions need to be allocated separately to several threads at the beginning. This is why C0 consumes 4 partitions, followed by 2 threads each consuming 3 partitions, the specific process is described in the following debug information:
Ctx.mytopicthreadidsnpartsperconsumer = 10/3 = 3nConsumersWithExtraPart = 10 3 = 1 First time: Myconsumerposition = 1startPart = 1 * 3 + min (1, 1) = 4---that is, start reading from partition 4 Nparts = 3 + (if (1 + 1 > 1) 0 else 1) = 3 reads 3 partitions, or 4,5,6 second time: Myconsumerposition = 0startPart = 3 * 0 + min (1, 0) =0---Start reading from partition 0 Nparts = 3 + (if (0 + 1 > 1) 0 else 1) = 4 reads 4 partitions, or 0,1,2,3 third time: Myconsumer Position = 2startPart = 3 * 2 + min (2, 1) = 7---Start reading from partition 7 Nparts = 3 + if (2 + 1 > 1) 0 else 1) = 3 reads 3 partitions, 7, 8, 9 to 10 The partitions are already allocated. In this case, there is often a need for a consumer thread to consume the specified partition without consuming other partitions. Frankly speaking, currently Kafka does not provide a custom allocation policy. It's hard to do it, but think about it, maybe we expect Kafka to do too much, after all, it's just a messaging engine, and the logic of adding message consumption to Kafka may not be the Kafka thing to do. Tags: Kafka hu xi
Comments:#1楼 2015-09-20 10:30 | Man_ Hua
"Let one consumer thread consume the specified partition and not consume the other partition" already have, specify partition_id can, see Kafka-python
Self.consumer = Kafkaconsumer ((topic, int (partition_id)),
Group_id=gid,
Bootstrap_servers=[bs_server]
) #2楼 [ landlord] 2015-09-23 06:55 | Hu Xi
@ man_ Hua
Well, I'm not clear about that here. It is true that Simpleconsumer can be used for fine-grained control, and I mean that the high-level consumer API is not available at this point, which is why the Kafka0.9 version offers a new version of consumer
"Go" How to determine the number of partitions, keys, and consumer threads for Kafka