Install Kafka cluster in Centos

Last Update:2015-03-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Kafka is a distributed MQ system developed and open-source by LinkedIn. It is now an incubator project of Apache. On its homepage, kafka is described as a high-throughput distributed MQ that can distribute messages to different nodes. In this blog post, the author briefly mentioned the reasons for developing kafka without selecting an existing MQ system. Two reasons: Performance and scalability. Kafka is compiled by only 7000 lines of Scala. It is understood that Kafka can produce about 0.25 million messages per second (50 MB) and process 0.55 million messages per second (110 MB ).

Install the prepared version

Kafka: kafka_2.10-0.8.2.0

Zookeeper version: 3.4.6

Zookeeper cluster: hadoop104, hadoop107, hadoop108

For how to build a Zookeeper cluster, see installing ZooKeeper cluster on CentOS.

Physical Environment

Install two hosts:

192.168.40.104 hadoop104 (run 3 brokers)

192.148.40.105 hadoop105 (run 2 brokers)

This cluster is mainly divided into three steps: Single-node single-Broker, single-node multi-Broker, multi-node multi-Broker

Single-node single Broker

This section uses creating a Broker on hadoop104 as an example.

Download kafka

Http://kafka.apache.org/downloads.html download path

[Html] view plaincopyprint?

# Tar-xvfkafka_2.10-0.8.2.0.tgz
# Cdkafka_2.10-0.8.2.0

Configuration

Modify config/server. properties

[Html] view plaincopyprint?

Broker. id = 1
Port = 9092
Host. name = hadoop104
Socket. send. buffer. bytes = 1048576
Socket. receive. buffer. bytes = 1048576
Socket. request. max. bytes = 104857600
Log. dir =./kafka1-logs
Num. partitions = 10
Zookeeper. connect = hadoop107: 2181, hadoop104: 2181, hadoop108: 2181

Start the Kafka Service

[Html] view plaincopyprint?

# Bin/ kafka-server-start.shconfig/server. properties

Create a Topic

[Html] view plaincopyprint?

# Bin/kafka-topics.sh -- create -- zookeeperhadoop107: 2181, hadoop104: 2181, hadoop108: 2181 -- replication-factor1 -- partitions1 -- topictest

View topics

[Html] view plaincopyprint?

# Bin/kafka-topics.sh -- list -- zookeeperhadoop107: 2181, hadoop104: 2181, hadoop108: 2181

Output:

Producer sends messages

[Html] view plaincopyprint?

# Bin/kafka-console-producer.sh -- broker-listlocalhost: 9092 -- topictest

Consumer receives messages

[Html] view plaincopyprint?

# Bin/kafka-console-consumer.sh -- zookeeperhadoop107: 2181, hadoop104: 2181, hadoop108: 2181 -- topictest -- from-beginning

If you want the latest data, you can just remove the -- from-beginning parameter.

#/Bin/kafka-console-consumer.sh -- zookeeperhadoop107: 2181, hadoop104: 2181, hadoop108: 2181 -- topic test

Multiple brokers in a Single Node

Configuration

Copy the folders in the previous chapter to kafka_2 and kafka_3.

[Html] view plaincopyprint?

# Cp-rkafka_2.10-0.8.2.0kafka_2
# Cp-rkafka_2.10-0.8.2.0kafka_3

Modify the broker. id and port attributes in kafka_2/config/server. properties and kafka_3/config/server. properties respectively to ensure uniqueness.

[Html] view plaincopyprint?

Kafka_2/config/server. properties
Broker. id = 2
Port = 9093
Kafka_3/config/server. properties
Broker. id = 3
Port = 9094

Start the other two brokers [html] view plaincopyprint?

# Cdkafka_2
# Bin/kafka-server-start.shconfig/server. properties &
# Cd ../kafka_3
# Bin/kafka-server-start.shconfig/server. properties &

Create a topic [html] view plaincopyprint with replication factor 3?

# Bin/kafka-topics.sh -- create -- zookeeperhadoop107: 2181, hadoop104: 2181, hadoop108: 2181 -- replication-factor3 -- partitions1 -- topicmy-replicated-topic

View the Topic status

[Html] view plaincopyprint?

Bin/kafka-topics.sh -- describe -- zookeeperhadoop107: 2181, hadoop104: 2181, hadoop108: 2181 -- topicmy-replicated-topic

From the above content, we can see that the topic contains 1 part, replicationfactor is 3, and Node3 is leador:

"Leader" is the node responsible for all reads and writes for the given partition. Each node will be the leader for a randomly selected portion of the partitions.
"Replicas" is the list of nodes that replicate the log for this partition regardless of whether they are the leader or even if they are currently alive.
"Isr" is the set of "in-sync" replicas. This is the subset of the replicas list that is currently alive and caught-up to the leader.

Let's take a look at the created test topic. It can be seen that multiple brokers without multiple replication nodes decompress the downloaded files to the kafka_4 and kafka_5 folders on hadoop105, then, set the server on hadoop104. copy the properties configuration file to the connected folder [html] view plaincopyprint?

# Scp-rconfig/root @ hadoop105:/root/hadoop/kafka_4/
# Scp-rconfig/root @ hadoop105:/root/hadoop/kafka_5/

Configure and modify the content as follows: [html] view plaincopyprint?

Kafka_4
Brokerid = 4
Port = 9095
Host. name = hadoop105
Kafka_5
Brokerid = 5
Port = 9096
Host. name = hadoop105

Start the service [html] view plaincopyprint?

# Cdkafka_4
# Bin/kafka-server-start.shconfig/server. properties &
# Cd ../kafka_5
# Bin/kafka-server-start.shconfig/server. properties &

Up to now, five brokers on two physical machines have been started.

Summary

In the core idea of kafka, data does not need to be cached in the memory, because the file cache of the operating system is perfect and powerful enough, as long as no random write is required, sequential read/write performance is very efficient. The data of kafka is only appended sequentially. The data deletion policy is to accumulate to a certain extent or to delete the data after a certain period of time. Another unique feature of Kafka is to store consumer information on the client rather than the MQ server, so that the server does not need to record the message delivery process, each client knows where to read the message next time. The message delivery process also uses the client's active pull model, which greatly reduces the burden on the server. Kafka also emphasizes reducing the serialization and copy overhead of data. It organizes some messages into Message sets for batch storage and sending, and when the client is running pull data, try to transmit data in zero-copy mode and use sendfile (corresponding to FileChannel in java. an advanced IO function such as transferTo/transferFrom to reduce the copy overhead. It can be seen that kafka is a well-designed MQ system specific to some applications. I estimate that more and more MQ systems tend to be in favor of specific fields and consider vertical product policy values.

As long as the disk is not limited and there is no loss, kafka can store messages for a long period of time (one week ).

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Install Kafka cluster in Centos

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Install Kafka cluster in Centos

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support