Kafka Learning: Installation of Kafka cluster under Centos
Source: Internet
Author: User
Kafka is a distributed MQ system developed and open sourced by LinkedIn and is now an incubation project of Apache. On its homepage, it describes kafka as a high-throughput distributed (which can distribute messages to different nodes) MQ. In this blog post, the author briefly mentions the reasons for developing kafka rather than choosing an existing MQ system. Two reasons: performance and scalability. Kafka is only written in 7000 lines of Scala. It is understood that Kafka can produce about 250,000 messages per second (50 MB) and process 550,000 messages per second (110 MB).
Preparation for installation
version
Kafka version: kafka_2.10-0.8.2.0
Configuration
Copy the two folders in the previous chapter into kafka_2 and kafka_3
#cp -r kafka_2.10-0.8.2.0 kafka_2
#cp -r kafka_2.10-0.8.2.0 kafka_3
Modify the broker.id and port properties in the kafka_2 / config / server.properties and kafka_3 / config / server.properties files to ensure uniqueness
Start two other brokers
#cd kafka_2
# bin / kafka-server-start.sh config / server.properties &
#cd ../kafka_3
# bin / kafka-server-start.sh config / server.properties &
Create a topic with a replication factor of 3
# bin / kafka-topics.sh --create --zookeeper hadoop107: 2181, hadoop104: 2181, hadoop108: 2181 --replication-factor 3 --partitions 1 --topic my-replicated-topic
View the status of Topic
bin / kafka-topics.sh --describe --zookeeper hadoop107: 2181, hadoop104: 2181, hadoop108: 2181 --topic my-replicated-topic
As can be seen from the above content, the topic contains 1 part, the replication factor is 3, and Node3 is the leader explained as follows:
"leader" is the node responsible for all reads and writes for the given partition. Each node will be the leader for a randomly selected portion of the partitions.
"replicas" is the list of nodes that replicate the log for this partition regardless of whether they are the leader or even if they are currently alive.
"isr" is the set of "in-sync" replicas. This is the subset of the replicas list that is currently alive and caught-up to the leader.
Let's take a look at the test topic that was created before. It can be seen that there is no replication.
Multiple brokers on multiple nodes
Unzip the downloaded files into two folders kafka_4 and kafka_5 on hadoop105, and then copy the server.properties configuration file on hadoop104 to this folder
#scp -r config / [email protected]: / root / hadoop / kafka_4 /
#scp -r config / [email protected]: / root / hadoop / kafka_5 /
Configure and modify the contents as follows:
kafka_4
brokerid = 4
port = 9095
host.name = hadoop105
kafka_5
brokerid = 5
port = 9096
host.name = hadoop105
Start service
#cd kafka_4
# bin / kafka-server-start.sh config / server.properties &
#cd ../kafka_5
# bin / kafka-server-start.sh config / server.properties &
So far, 5 brokers on two physical machines have been started
to sum up
In the core idea of kafka, there is no need to cache data in memory, because the file cache of the operating system is sufficiently complete and powerful, as long as random writes are not performed, the performance of sequential read and write is very efficient. Kafka's data will only be appended sequentially, and the data deletion strategy is to accumulate to a certain extent or delete it after a certain period of time. Another unique aspect of Kafka is to store consumer information on the client instead of the MQ server, so that the server does not need to record the delivery process of the message, and each client knows where and where it should read the message next time The message delivery process also uses the model of the client's active pull, which greatly reduces the burden on the server. Kafka also emphasizes the reduction of data serialization and copying overhead. It organizes some messages into Message Set for batch storage and sending. When the client pulls the data, it tries to transmit it in zero-copy mode, using sendfile (corresponding to java FileChannel.transferTo / transferFrom) to reduce copy overhead. It can be seen that kafka is a well-designed MQ system specific to certain applications. I estimate that this kind of MQ system biased to a specific field will be more and more, considering the vertical product strategy value.
As long as the disk is unlimited and there is no loss, kafka can store messages for quite a long time (one week).
Kafka learning: installation of Kafka cluster under Centos
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.