This article is divided into three parts:
- Kafka Topic Creation Method
- Kafka Topic Partitions Assignment Implementation principle
- Kafka Resource Isolation Scheme
1. Kafka Topic Creation Method kafka Topic creation method has the following two manifestations: (1) directly specify the storage mapping relationship between topic topic Partition and Replica broker when creating Kafka /usr/lib/kafka_2.10-0.8.2.1/bin/kafka-topics.sh--zookeeper zookeeperhost:zookeeperport--create--topic Topicname--replica-assignment id0:id1:id2,id3:id4:id5,id6:id7:id8 which, "Id0:id1:id2,id3:id4:id5,id6:id7:id8" Topic topicname There are altogether 3 Partition (separated by ","), each Partition has 3 replica (separated by ":"), Topic Partition replica and Kafka The correspondence between broker is as follows: partition0 Replica:broker id0, broker ID1, broker Id2;partition1 Replica:broker ID3, broker Id4, Broker Id5;partition2 Replica:broker Id6, broker Id7, Broker id8; (2) topic is automatically assigned Kafka topic when creation The storage mapping relationship between replica and Kafka broker /usr/lib/kafka_2.10-0.8.2.1/bin/kafka-topics.sh--zookeeper Zookeeperhost: Zookeeperport--create--topic topicname (1) is entirely dependent on man for manual designation, and this is only a discussion of how "auto-allocation" is implemented when creating topic using the (2) method. 2. kafka Topic Partition Replica Assignment Implementation principle replica assignment target has two: (1) to make Partition Replica can be evenly distributed toEach Kafka broker (load Balancer), (2) if the first replica of the partition is assigned to a Kafka Broker, then this partition other replica needs to be assigned to the other Kafka Brokers, That is, Partition replica is assigned to a different broker; note that there is a constraint: Topic Partition replicas size <= Kafka Brokers size. The core work process of "Auto Assign" is as follows: randomly select a startingbroker (broker ID0, broker ID1, broker Id2 、... ), randomly select the Increasingshift initial value ([0,nbrokers-1]) (1) from Startingbroker, using polling method to assign the replicas of each partition to each broker in turn; For each Partition,replicas the allocation process is as follows: (2) partition the first replica assigned to startingbroker; (3) calculates the shift (that is, the interval from the 1th replica) of the nth (n>=2) replica according to Increasingshift and assigns it to the corresponding broker; based on shift (4) Startingbroker move to the next broker; (5) If brokers has been polled once, then increasingshift increments one; otherwise, continue (2). Assuming there are 5 brokers (broker-0, Broker-1, Broker-2, broker-3, broker-4), topic has 10 partition (P0, p1, p2, p3, P4, P5, P6, P7, P8 , p9), each partition has 3 replica, according to the above work process, the distribution results are as follows: broker-0 broker-1 broker-2 broker-3 Broker-4p0 P1 &NBSP;P2 P3 p4 (1st replica) P5 &nbs P P6 &NBSP;P7 P8 &N Bsp p9 (1st replica) P4 P0 &NBSP ; &NBSP;P1 P2 p3 &nbs P (2nd replica) P8 p9 &NBSP;P5 &NBS P P6 p7 (2nd replica) P3 &NBSP ; P4 p0 P1 &N Bsp p2 (3nd replica) P7 P8 &NBSP ; &NBSP;P9 P5 p6 &NB Sp (3nd Replica) Detailed steps below: Select Broker-0 as Startingbroker,increasingshift initial value for 1, for P0, REPLICA1 assigned to Broker-0,increasingshift is 1, so REPLICA2 is assigned to BROKER-1,REPLICA3 and assigned to Broker-2, and P1,REPLICA1 is assigned to Broker-1, Increasingshift is 1, so REPLICA2 is assigned to BROKER-2,REPLICA3 and assigned to Broker-3, P2,REPLICA1 to Broker-2,increasingshift to 1, So the REPLICA2 assigned to BROKER-3,REPLICA3 is assigned to Broker-4, and for the P3,REPLICA1 assigned to Broker-3,increasingshift is 1, so REPLICA2 is assigned to Broker-4, REPLICA3 assigned to Broker-1, P4,REPLICA1 assigned to Broker-4,increasingshift is 1, so REPLICA2 assigned to BROKER-0,REPLICA3 assigned to Broker-1; NOTE: Increasingshift is used to calculate the amount of space between the nth (n>=2) replica of partition and the 1th replica of the shift,shift representation. If the Increasingshift value is M, then the interval between the 2nd replica of partition and the 1th replica is M + 1, the 3rd replica and 1th replica have an interval of M + 2, ..., and so on. Shift value Range: [1,brokersize-1]. At this time, broker-0, Broker-1, Broker-2, Broker-3, broker-4 are polled as Startingbroker respectively, and continue polling, but the increasingshift increments to 2. assigning to broker-for P5,REPLICA10,increasingshift is 2, so REPLICA2 is assigned to BROKER-2,REPLICA3 and assigned to Broker-3, P6,REPLICA1 to Broker-1,increasingshift to 2, So the REPLICA2 assigned to BROKER-3,REPLICA3 is assigned to Broker-4, and for the P7,REPLICA1 assigned to Broker-2,increasingshift is 2, so REPLICA2 is assigned to Broker-4, The REPLICA3 is assigned to broker-0, and P8,REPLICA1 is assigned to Broker-3,increasingshift 2, so REPLICA2 is assigned to BROKER-0,REPLICA3 and Broker-1, for P9, REPLICA1 assigned to Broker-4,increasingshift is 2, so REPLICA2 assigned to BROKER-1,REPLICA3 assigned to broker-2; at this time, broker-0, Broker-1, Broker-2, broker-3, Broker-4 are again polled as Startingbroker, and if there are other partition, continue polling, Increasingshift increment to 3, and so on. Here are a few things to note: (1) Why choose Startingbroker randomly instead of picking broker-0 as Startingbroker each time? Take broker-0, Broker-1, Broker-2, broker-3, broker-4 as an example, because the allocation process is done by polling, if Broker-0 is selected every time as Startingbroker, Then the previous part of the brokers list is likely to be assigned a relatively large number of partition replicas, resulting in this part of the brokers load higher, random selection can guarantee a relatively good uniform effect. (2) Why does the brokers list need to be incremented by 1 each time the Increasingshift is polled? kafka Topic partition A large number of cases, the partition of the 1th replica and the nth (n>=2) replica between the amount of change with the change of increasingshift surface, The replica can be better distributed evenly. scala.kafka.admin.admiNutils.assignreplicastobrokers () To achieve the above topic Partition replica and broker distribution process, the source code is as follows: brokerlist:kafka Brokers list; number of Partition to be allocated; Replicationfactor:topic Partition replica number; Fixedstartindex: if specified, The default value is 0, and its value is related to two variable values: startindex and Nextreplicashift, see below for details, Startpartitionid: Which partition from topic is allocated, usually 0, Topic the value is not 0 when the partition is added. val ret = new mutable. Hashmap[int, List[int]] () Allocation results saved to a map variable Ret,key to partition Id,value for the assigned brokers list. val StartIndex = if (fixedstartindex >= 0) Fixedstartindex else rand.nextint (brokerlist.size) var Currentpartitionid = if (startpartitionid >= 0) Startpartitionid else 0 var nextreplicashift = if (FixedStar Tindex >= 0) Fixedstartindex else rand.nextint (brokerlist.size) startindex means Startingbroker, Currentpartitionid indicates which partition is currently assigned Brokers,nextreplicashift represents the current Increasingshit value. Next is a loop that allocates brokers for each partition replicas, where partition's 1th replica by "(Currentpartitionid + startIndex )% Brokerlist. Size "determines that the remaining replica are determined by" replicaindex () ". shift represents the amount of space between the nth (n >= 2) replica and the first replica, "1 + (Secondreplicashift + replicaindex)% (nBrokers-1)" is very clever, it guarantees the range of shift values: [1,nbrokers] (you can see for yourself). 3. kafka resource Isolation Scheme real-time data processing scenario, if the volume is large, in order to guarantee write/consume throughput, we typically specify a larger number of partition when creating topic. This allows the data to be dispersed as far as possible to more partition,partition are distributed as evenly as possible to the various brokers in the Kafka cluster, and everything is fine from a load balancing standpoint. From the business point of view, there will be a problem of resource competition, after all, the bandwidth resources of the Kafka broker Machine is limited, in the case of tight bandwidth, any one of the business party's data volume fluctuations (here only an increase in the index), all business parties will be affected; From an operational perspective, there will be usability issues, Any Kafka broker machine is loaded with all topic data transfer, storage, and if there is an outage, it will spread to all topic. In view of this situation, we propose a resource isolation scheme for dividing the resource pool: kafka Cluster has 9 brokers components: broker-1, broker-2 、...、 broker-9, create 9 topic:t1, T2 、...、 T9 , each topic has 9 partition (assuming replica is 1), as shown, we cut 9 brokers into 3 resource pools: Pool1 (broker-1, Broker-2, broker-3), Pool2 (Broker-4, Broker-5, broker-6), Pool3 (broker-7, Broker-8, broker-9), the distribution of topic is as follows: pool1:t1, T2, T3pool2:t4, T5, T6pool3:t7, T8, t9 can see that the physical resources of these three resource pools are completely independent, three resource pools are actually equivalent to three small clusters. This resource pool partition method not only can achieve the isolation of physical resources, but also to some extent to solve the problem of heterogeneous models (MEM, DISK), can be similar to the modelMachine to form a resource pool. The actual implementation needs to consider the business situation, the machine situation, the reasonable partition resource pool, and according to the specific topic situation to allocate it to the appropriate resource pool. The creation of the kafka topic is also changed into two steps: (1) using kafka-topics.sh to create topic; (2) using kafka-reassign-partitions.sh to move topic Partition Replicas to the specified resource pool (the specific brokers list).
Kafka Topic Partition Replica Assignment Implementation principle and resource isolation scheme