How to manage and balance "Huge Data Load" for Big Kafka Clusters---Reference

Source: Internet
Author: User

1. Add Partition Tool

Partitions act as unit of parallelism. Messages of a single topic is distributed to multiple partitions the can is stored and served on different servers. Upon creation of a topic, the number of partitions for this topic have to be specified. Later on more partitions is needed for this topic when the volume of this topic increases. This tool helps to add more partitions for a specific topic and also allow manual replica assignment of the added partitio Ns. can refer to the previous blog Quick steps:have a Kafka Cluster up & Running in 3 minutes to setup Kafka Cluster and create topics.

1234567891011121314151617181920 bin/kafka-add-partitions.sh Option                                                    Description------                                                       -------------partition <Integer: # of partitions>      REQUIRED: Number of partitions to addto the topic --replica-assignment-list                       For manually assigning replicas to<broker_id_for_part1_replica1 :            brokers for the new partitionsbroker_id_for_part1_replica2,              (default: )broker_id_for_part2_replica1 :broker_id_for_part2_replica2, ...> --topic <topic>                                      REQUIRED: The topic for whichpartitions need to be added.--zookeeper <urls>                               REQUIRED: The connection string forthe zookeeper connection in the formhost:port. Multiple URLS can begiven to allow fail-over.
2. Reassign partitions Toolwhat does the tool do?

The goal of this tool are similar to the referred Replica Leader election tool as to achieve load balance across brokers. But instead of electing a new leader from the assigned replicas of a partition, this tool allows to change the Assign Ed replicas of Partitions–remember that followers also need to fetch from leaders in order to keep in sync, hence Someti Me only balance the leadership load was not enough.

A Summary of the steps, the tool does is shown below-

1. The tool updates the zookeeper path "/admin/reassign_partitions" with the list of topic partitions and (if specified in The Json file) the list of their new assigned replicas.
2. The controller listens to the path above. When a data change update was triggered, the controller reads the list of topic partitions and their assigned replicas from Zookeeper.
3. For each topic partition, the controller does the following:
3.1. Start new Replicas in rar-ar (RAR = reassigned replicas, AR = original list of Assigned replicas)
3.2. Wait until new replicas is in sync with the leader
3.3. If the leader is not on RAR, elect a new leader from RAR
3.4 4. Stop Old Replicas Ar-rar
3.5. Write New AR
3.6. Remove partition from The/admin/reassign_partitions path

How do I use the tool?
1234567891011121314151617181920212223242526272829303132333435 bin/kafka-reassign-partitions.shbin/kafka-reassign-partitions.shOption                                                        Description------                                                           -------------broker-list <brokerlist>                            The list of brokers to which thepartitions need to be reassigned inthe form "0,1,2". This is requiredfor automatic topic reassignment.--execute [execute]                                   This option does the actualreassignment. By default, the tooldoes a dry run--manual-assignment-json-file <manual                 The JSON file with the list of manualassignment json file path>                          reassignmentsThis option or topics-to-move-json-file needs to bespecified. The format to use is -{"partitions":[{"topic": "foo","partition": 1,"replicas": [1,2,3] }],"version":1}--topics-to-move-json-file <topics to                The JSON file with the list of topicsreassign json file path>                           to reassign.This option or manual-assignment-json-file needs to bespecified. The format to use is -{"topics":[{"topic": "foo"},{"topic": "foo1"}],"version":1}--zookeeper <urls>                                   REQUIRED: The connection string forthe zookeeper connection in the formhost:port. Multiple URLS can begiven to allow fail-over.
3. Add Brokers (Cluster Expansion)

Cluster expansion involves including brokers with new broker IDs in a Kafka Cluster. Typically, when you add new brokers to a cluster, they won't receive any data from existing topics until this tool is R UN to assign existing topics/partitions to the new brokers. The tool allows 2 options to make it easier to move some topics on bulk to the new brokers. These 2 options is a) topics to move B) List of newly added brokers. Using the These 2 options, the tool automatically figures out the placements of partitions for the topics on the new brokers.

The following example moves 2 topics (foo1, Foo2) to newly added brokers in a cluster (5,6,7).

1234567 > ./bin/kafka-reassign-partitions.sh --topics-to-move-json-file topics-to-move.json --broker-list "5,6,7" --execute>  cat topics-to-move.json{"topics":[{"topic": "foo1"},{"topic": "foo2"}],"version":1}
Selectively moving some partitions to a broker

The partition movement tool can also is moved to selectively move some replicas for certain partitions over to a particula R broker. Typically, if you end up with a unbalanced cluster, you can use the tool in this mode to selectively move partitions Arou nd. In this mode, the tool takes a single file which have a list of partitions to move and the replicas so each of those part Itions should is assigned to.

The following example moves 1 partition (foo-1) from replicas to 1,2,4

12345678910 > ./bin/kafka-reassign-partitions.sh --manual-assignment-json-file partitions-to-move.json --execute> cat partitions-to-move.json{"partitions":[{"topic": "foo","partition": 1,"replicas": [1,2,4] }],}],"version":1}

Note  :  These tools is available in version 0.8, not prior versions.

Be sociable, share!http://xebee.xebia.in/index.php/2014/12/04/ how-to-manage-and-balance-huge-data-load-for-big-kafka-clusters/

How to manage and balance "Huge Data Load" for Big Kafka Clusters---reference

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.