Install Kafka cluster in Centos

Source: Internet
Author: User

Install Kafka cluster in Centos

Kafka is a distributed MQ system developed and open-source by LinkedIn. It is now an incubator project of Apache. On its homepage, kafka is described as a high-throughput distributed MQ that can distribute messages to different nodes. In this blog post, the author briefly mentioned the reasons for developing kafka without selecting an existing MQ system. Two reasons: Performance and scalability. Kafka is compiled by only 7000 lines of Scala. It is understood that Kafka can produce about 0.25 million messages per second (50 MB) and process 0.55 million messages per second (110 MB ).

Install the prepared version

Kafka: kafka_2.10-0.8.2.0

Zookeeper version: 3.4.6

Zookeeper cluster: hadoop104, hadoop107, hadoop108

For how to build a Zookeeper cluster, see installing ZooKeeper cluster on CentOS.

Physical Environment

Install two hosts:

192.168.40.104 hadoop104 (run 3 brokers)

192.148.40.105 hadoop105 (run 2 brokers)

This cluster is mainly divided into three steps: Single-node single-Broker, single-node multi-Broker, multi-node multi-Broker

Single-node single Broker

This section uses creating a Broker on hadoop104 as an example.

Download kafka

Http://kafka.apache.org/downloads.html download path

[Html] view plaincopyprint?
  1. # Tar-xvfkafka_2.10-0.8.2.0.tgz
  2. # Cdkafka_2.10-0.8.2.0
Configuration

Modify config/server. properties

[Html] view plaincopyprint?
  1. Broker. id = 1
  2. Port = 9092
  3. Host. name = hadoop104
  4. Socket. send. buffer. bytes = 1048576
  5. Socket. receive. buffer. bytes = 1048576
  6. Socket. request. max. bytes = 104857600
  7. Log. dir =./kafka1-logs
  8. Num. partitions = 10
  9. Zookeeper. connect = hadoop107: 2181, hadoop104: 2181, hadoop108: 2181


Start the Kafka Service

[Html] view plaincopyprint?
  1. # Bin/ kafka-server-start.shconfig/server. properties

Create a Topic

[Html] view plaincopyprint?
  1. # Bin/kafka-topics.sh -- create -- zookeeperhadoop107: 2181, hadoop104: 2181, hadoop108: 2181 -- replication-factor1 -- partitions1 -- topictest

View topics

[Html] view plaincopyprint?
  1. # Bin/kafka-topics.sh -- list -- zookeeperhadoop107: 2181, hadoop104: 2181, hadoop108: 2181

Output:

Producer sends messages

[Html] view plaincopyprint?
  1. # Bin/kafka-console-producer.sh -- broker-listlocalhost: 9092 -- topictest


Consumer receives messages

[Html] view plaincopyprint?
  1. # Bin/kafka-console-consumer.sh -- zookeeperhadoop107: 2181, hadoop104: 2181, hadoop108: 2181 -- topictest -- from-beginning

If you want the latest data, you can just remove the -- from-beginning parameter.

#/Bin/kafka-console-consumer.sh -- zookeeperhadoop107: 2181, hadoop104: 2181, hadoop108: 2181 -- topic test

Multiple brokers in a Single Node

Configuration

Copy the folders in the previous chapter to kafka_2 and kafka_3.

[Html] view plaincopyprint?
  1. # Cp-rkafka_2.10-0.8.2.0kafka_2
  2. # Cp-rkafka_2.10-0.8.2.0kafka_3

Modify the broker. id and port attributes in kafka_2/config/server. properties and kafka_3/config/server. properties respectively to ensure uniqueness.

[Html] view plaincopyprint?
  1. Kafka_2/config/server. properties
  2. Broker. id = 2
  3. Port = 9093
  4. Kafka_3/config/server. properties
  5. Broker. id = 3
  6. Port = 9094

Start the other two brokers [html] view plaincopyprint?
  1. # Cdkafka_2
  2. # Bin/kafka-server-start.shconfig/server. properties &
  3. # Cd ../kafka_3
  4. # Bin/kafka-server-start.shconfig/server. properties &

Create a topic [html] view plaincopyprint with replication factor 3?
  1. # Bin/kafka-topics.sh -- create -- zookeeperhadoop107: 2181, hadoop104: 2181, hadoop108: 2181 -- replication-factor3 -- partitions1 -- topicmy-replicated-topic
View the Topic status
[Html] view plaincopyprint? 
  1. Bin/kafka-topics.sh -- describe -- zookeeperhadoop107: 2181, hadoop104: 2181, hadoop108: 2181 -- topicmy-replicated-topic
From the above content, we can see that the topic contains 1 part, replicationfactor is 3, and Node3 is leador:
  • "Leader" is the node responsible for all reads and writes for the given partition. Each node will be the leader for a randomly selected portion of the partitions.
  • "Replicas" is the list of nodes that replicate the log for this partition regardless of whether they are the leader or even if they are currently alive.
  • "Isr" is the set of "in-sync" replicas. This is the subset of the replicas list that is currently alive and caught-up to the leader.
Let's take a look at the created test topic. It can be seen that multiple brokers without multiple replication nodes decompress the downloaded files to the kafka_4 and kafka_5 folders on hadoop105, then, set the server on hadoop104. copy the properties configuration file to the connected folder [html] view plaincopyprint?
  1. # Scp-rconfig/root @ hadoop105:/root/hadoop/kafka_4/
  2. # Scp-rconfig/root @ hadoop105:/root/hadoop/kafka_5/
Configure and modify the content as follows: [html] view plaincopyprint?
  1. Kafka_4
  2. Brokerid = 4
  3. Port = 9095
  4. Host. name = hadoop105
  5. Kafka_5
  6. Brokerid = 5
  7. Port = 9096
  8. Host. name = hadoop105
Start the service [html] view plaincopyprint?
  1. # Cdkafka_4
  2. # Bin/kafka-server-start.shconfig/server. properties &
  3. # Cd ../kafka_5
  4. # Bin/kafka-server-start.shconfig/server. properties &
Up to now, five brokers on two physical machines have been started.

Summary

In the core idea of kafka, data does not need to be cached in the memory, because the file cache of the operating system is perfect and powerful enough, as long as no random write is required, sequential read/write performance is very efficient. The data of kafka is only appended sequentially. The data deletion policy is to accumulate to a certain extent or to delete the data after a certain period of time. Another unique feature of Kafka is to store consumer information on the client rather than the MQ server, so that the server does not need to record the message delivery process, each client knows where to read the message next time. The message delivery process also uses the client's active pull model, which greatly reduces the burden on the server. Kafka also emphasizes reducing the serialization and copy overhead of data. It organizes some messages into Message sets for batch storage and sending, and when the client is running pull data, try to transmit data in zero-copy mode and use sendfile (corresponding to FileChannel in java. an advanced IO function such as transferTo/transferFrom to reduce the copy overhead. It can be seen that kafka is a well-designed MQ system specific to some applications. I estimate that more and more MQ systems tend to be in favor of specific fields and consider vertical product policy values.

As long as the disk is not limited and there is no loss, kafka can store messages for a long period of time (one week ).

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.