Kafka Learning: Installation of Kafka cluster under Centos

Source: Internet
Author: User
Kafka is a distributed MQ system developed and open sourced by LinkedIn and is now an incubation project of Apache. On its homepage, it describes kafka as a high-throughput distributed (which can distribute messages to different nodes) MQ. In this blog post, the author briefly mentions the reasons for developing kafka rather than choosing an existing MQ system. Two reasons: performance and scalability. Kafka is only written in 7000 lines of Scala. It is understood that Kafka can produce about 250,000 messages per second (50 MB) and process 550,000 messages per second (110 MB).

Preparation for installation
version
Kafka version: kafka_2.10-0.8.2.0

Zookeeper version: 3.4.6

Zookeeper cluster: hadoop104, hadoop107, hadoop108

For the construction of Zookeeper cluster, please refer to: Installing ZooKeeper Cluster on CentOS

Physical environment
Install two physical machines:

192.168.40.104 hadoop104 (running 3 brokers)

192.148.40.105 hadoop105 (run 2 brokers)

The creation of the cluster is mainly divided into three steps, single node single broker, single node multiple broker, multiple node multiple broker



Single node single broker
This section takes the creation of a broker on hadoop104 as an example

Download kafka

Download path: http://kafka.apache.org/downloads.html

#tar -xvf kafka_2.10-0.8.2.0.tgz
# cd kafka_2.10-0.8.2.0
Configuration
Modify config / server.properties

     broker.id = 1
     port = 9092
     host.name = hadoop104
     socket.send.buffer.bytes = 1048576
     socket.receive.buffer.bytes = 1048576
     socket.request.max.bytes = 104857600
     log.dir =. / kafka1-logs
     num.partitions = 10
     zookeeper.connect = hadoop107: 2181, hadoop104: 2181, hadoop108: 2181


Start Kafka service

# bin / kafka-server-start.sh config / server.properties



Create Topic

# bin / kafka-topics.sh --create --zookeeper hadoop107: 2181, hadoop104: 2181, hadoop108: 2181 --replication-factor 1 --partitions 1 --topic test
View Topic

# bin / kafka-topics.sh --list --zookeeper hadoop107: 2181, hadoop104: 2181, hadoop108: 2181
Output:




Producer sends a message

# bin / kafka-console-producer.sh --broker-list localhost: 9092 --topic test



consumer receives the message

# bin / kafka-console-consumer.sh --zookeeper hadoop107: 2181, hadoop104: 2181, hadoop108: 2181 --topic test --from-beginning



If you want the latest data, you can do without the --from-beginning parameter.

# /bin/kafka-console-consumer.sh --zookeeper hadoop107: 2181, hadoop104: 2181, hadoop108: 2181 --topic test



Single node multiple brokers

Configuration
Copy the two folders in the previous chapter into kafka_2 and kafka_3

#cp -r kafka_2.10-0.8.2.0 kafka_2

#cp -r kafka_2.10-0.8.2.0 kafka_3

Modify the broker.id and port properties in the kafka_2 / config / server.properties and kafka_3 / config / server.properties files to ensure uniqueness

kafka_2 / config / server.properties
broker.id = 2
port = 9093
kafka_3 / config / server.properties
broker.id = 3
port = 9094

Start two other brokers
#cd kafka_2
# bin / kafka-server-start.sh config / server.properties &
#cd ../kafka_3
# bin / kafka-server-start.sh config / server.properties &

Create a topic with a replication factor of 3
# bin / kafka-topics.sh --create --zookeeper hadoop107: 2181, hadoop104: 2181, hadoop108: 2181 --replication-factor 3 --partitions 1 --topic my-replicated-topic
View the status of Topic
bin / kafka-topics.sh --describe --zookeeper hadoop107: 2181, hadoop104: 2181, hadoop108: 2181 --topic my-replicated-topic

As can be seen from the above content, the topic contains 1 part, the replication factor is 3, and Node3 is the leader explained as follows:
"leader" is the node responsible for all reads and writes for the given partition. Each node will be the leader for a randomly selected portion of the partitions.
"replicas" is the list of nodes that replicate the log for this partition regardless of whether they are the leader or even if they are currently alive.
"isr" is the set of "in-sync" replicas. This is the subset of the replicas list that is currently alive and caught-up to the leader.

Let's take a look at the test topic that was created before. It can be seen that there is no replication.

Multiple brokers on multiple nodes
Unzip the downloaded files into two folders kafka_4 and kafka_5 on hadoop105, and then copy the server.properties configuration file on hadoop104 to this folder
#scp -r config / [email protected]: / root / hadoop / kafka_4 /
#scp -r config / [email protected]: / root / hadoop / kafka_5 /

Configure and modify the contents as follows:
    kafka_4
        brokerid = 4
        port = 9095
        host.name = hadoop105
    kafka_5
        brokerid = 5
        port = 9096
        host.name = hadoop105

Start service
#cd kafka_4
# bin / kafka-server-start.sh config / server.properties &
#cd ../kafka_5
# bin / kafka-server-start.sh config / server.properties &


So far, 5 brokers on two physical machines have been started



to sum up
In the core idea of kafka, there is no need to cache data in memory, because the file cache of the operating system is sufficiently complete and powerful, as long as random writes are not performed, the performance of sequential read and write is very efficient. Kafka's data will only be appended sequentially, and the data deletion strategy is to accumulate to a certain extent or delete it after a certain period of time. Another unique aspect of Kafka is to store consumer information on the client instead of the MQ server, so that the server does not need to record the delivery process of the message, and each client knows where and where it should read the message next time The message delivery process also uses the model of the client's active pull, which greatly reduces the burden on the server. Kafka also emphasizes the reduction of data serialization and copying overhead. It organizes some messages into Message Set for batch storage and sending. When the client pulls the data, it tries to transmit it in zero-copy mode, using sendfile (corresponding to java FileChannel.transferTo / transferFrom) to reduce copy overhead. It can be seen that kafka is a well-designed MQ system specific to certain applications. I estimate that this kind of MQ system biased to a specific field will be more and more, considering the vertical product strategy value.

As long as the disk is unlimited and there is no loss, kafka can store messages for quite a long time (one week).

Kafka learning: installation of Kafka cluster under Centos
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.