Kafka Learning: Installation of Kafka cluster under Centos

Last Update:2015-03-11 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Kafka is a distributed MQ system developed and open sourced by LinkedIn and is now an incubation project of Apache. On its homepage, it describes kafka as a high-throughput distributed (which can distribute messages to different nodes) MQ. In this blog post, the author briefly mentions the reasons for developing kafka rather than choosing an existing MQ system. Two reasons: performance and scalability. Kafka is only written in 7000 lines of Scala. It is understood that Kafka can produce about 250,000 messages per second (50 MB) and process 550,000 messages per second (110 MB).

Preparation for installation
version
Kafka version: kafka_2.10-0.8.2.0

Zookeeper version: 3.4.6

Zookeeper cluster: hadoop104, hadoop107, hadoop108

For the construction of Zookeeper cluster, please refer to: Installing ZooKeeper Cluster on CentOS

Physical environment
Install two physical machines:

192.168.40.104 hadoop104 (running 3 brokers)

192.148.40.105 hadoop105 (run 2 brokers)

The creation of the cluster is mainly divided into three steps, single node single broker, single node multiple broker, multiple node multiple broker

Single node single broker
This section takes the creation of a broker on hadoop104 as an example

Download kafka

Download path: http://kafka.apache.org/downloads.html

#tar -xvf kafka_2.10-0.8.2.0.tgz
# cd kafka_2.10-0.8.2.0
Configuration
Modify config / server.properties

broker.id = 1
port = 9092
host.name = hadoop104
socket.send.buffer.bytes = 1048576
socket.receive.buffer.bytes = 1048576
socket.request.max.bytes = 104857600
log.dir =. / kafka1-logs
num.partitions = 10
zookeeper.connect = hadoop107: 2181, hadoop104: 2181, hadoop108: 2181

Start Kafka service

# bin / kafka-server-start.sh config / server.properties

Create Topic

# bin / kafka-topics.sh --create --zookeeper hadoop107: 2181, hadoop104: 2181, hadoop108: 2181 --replication-factor 1 --partitions 1 --topic test
View Topic

# bin / kafka-topics.sh --list --zookeeper hadoop107: 2181, hadoop104: 2181, hadoop108: 2181
Output:

Producer sends a message

# bin / kafka-console-producer.sh --broker-list localhost: 9092 --topic test

consumer receives the message

# bin / kafka-console-consumer.sh --zookeeper hadoop107: 2181, hadoop104: 2181, hadoop108: 2181 --topic test --from-beginning

If you want the latest data, you can do without the --from-beginning parameter.

# /bin/kafka-console-consumer.sh --zookeeper hadoop107: 2181, hadoop104: 2181, hadoop108: 2181 --topic test

Single node multiple brokers

Configuration
Copy the two folders in the previous chapter into kafka_2 and kafka_3

#cp -r kafka_2.10-0.8.2.0 kafka_2

#cp -r kafka_2.10-0.8.2.0 kafka_3

Modify the broker.id and port properties in the kafka_2 / config / server.properties and kafka_3 / config / server.properties files to ensure uniqueness

kafka_2 / config / server.properties
broker.id = 2
port = 9093
kafka_3 / config / server.properties
broker.id = 3
port = 9094

Start two other brokers
#cd kafka_2
# bin / kafka-server-start.sh config / server.properties &
#cd ../kafka_3
# bin / kafka-server-start.sh config / server.properties &

Create a topic with a replication factor of 3
# bin / kafka-topics.sh --create --zookeeper hadoop107: 2181, hadoop104: 2181, hadoop108: 2181 --replication-factor 3 --partitions 1 --topic my-replicated-topic
View the status of Topic
bin / kafka-topics.sh --describe --zookeeper hadoop107: 2181, hadoop104: 2181, hadoop108: 2181 --topic my-replicated-topic

As can be seen from the above content, the topic contains 1 part, the replication factor is 3, and Node3 is the leader explained as follows:
"leader" is the node responsible for all reads and writes for the given partition. Each node will be the leader for a randomly selected portion of the partitions.
"replicas" is the list of nodes that replicate the log for this partition regardless of whether they are the leader or even if they are currently alive.
"isr" is the set of "in-sync" replicas. This is the subset of the replicas list that is currently alive and caught-up to the leader.

Let's take a look at the test topic that was created before. It can be seen that there is no replication.

Multiple brokers on multiple nodes
Unzip the downloaded files into two folders kafka_4 and kafka_5 on hadoop105, and then copy the server.properties configuration file on hadoop104 to this folder
#scp -r config / [email protected]: / root / hadoop / kafka_4 /
#scp -r config / [email protected]: / root / hadoop / kafka_5 /

Configure and modify the contents as follows:
kafka_4
brokerid = 4
port = 9095
host.name = hadoop105
kafka_5
brokerid = 5
port = 9096
host.name = hadoop105

Start service
#cd kafka_4
# bin / kafka-server-start.sh config / server.properties &
#cd ../kafka_5
# bin / kafka-server-start.sh config / server.properties &

So far, 5 brokers on two physical machines have been started

to sum up
In the core idea of kafka, there is no need to cache data in memory, because the file cache of the operating system is sufficiently complete and powerful, as long as random writes are not performed, the performance of sequential read and write is very efficient. Kafka's data will only be appended sequentially, and the data deletion strategy is to accumulate to a certain extent or delete it after a certain period of time. Another unique aspect of Kafka is to store consumer information on the client instead of the MQ server, so that the server does not need to record the delivery process of the message, and each client knows where and where it should read the message next time The message delivery process also uses the model of the client's active pull, which greatly reduces the burden on the server. Kafka also emphasizes the reduction of data serialization and copying overhead. It organizes some messages into Message Set for batch storage and sending. When the client pulls the data, it tries to transmit it in zero-copy mode, using sendfile (corresponding to java FileChannel.transferTo / transferFrom) to reduce copy overhead. It can be seen that kafka is a well-designed MQ system specific to certain applications. I estimate that this kind of MQ system biased to a specific field will be more and more, considering the vertical product strategy value.

As long as the disk is unlimited and there is no loss, kafka can store messages for quite a long time (one week).

Kafka learning: installation of Kafka cluster under Centos

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Kafka Learning: Installation of Kafka cluster under Centos

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Kafka Learning: Installation of Kafka cluster under Centos

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support