I. About Kafka
Kafka is a high-throughput distributed publish-subscribe messaging system that handles all the action flow data in a consumer-scale website. This kind of action (web browsing, search and other user actions) is a key factor in many social functions on modern networks. This data is usually resolved by processing logs and log aggregations due to throughput requirements. This is a viable solution for the same log data and offline analysis system as Hadoop, but requires real-time processing constraints. The purpose of Kafka is to unify online and offline message processing through Hadoop's parallel loading mechanism, and also to provide real-time consumption through the cluster machine.
For more information about Kafka, please refer to: http://www.infoq.com/cn/articles/apache-kafka/
two. Preparatory work
1. Configure each host IP. Configure each host IP as a static IP (to ensure that each host can communicate properly, in order to avoid excessive network transmission, it is recommended that the same network segment)
2. Modify the machine host name. All hosts in the Kafka cluster need to be modified.
3. Configure each host mapping. Modify the Hosts file to include mappings for each host IP and host name.
4. Open the appropriate port. The ports that are configured in the following documents need to be open (or shut down the firewall), root permissions.
5. Ensure that the Zookeeper Cluster service is functioning properly. In fact, as long as the Zookeeper cluster deployment is successful, the above preparatory work can be done basically. For zookeeper Deployment Please refer to: http://www.cnblogs.com/wxisme/p/5178211.html
three. Installing Kafka
1. Download the Kafka installation package and visit the Kafka website to download the corresponding version. The version used here is 2.9.2-0.8.1.1.
2. Unzip the installation package using the following command
TAR-ZXVF kafka_2.9.2-0.8.1.1.tgz
3. Modify the configuration file, simple configuration only need to modify the/config/server.properties file.
Vim Config/server.properties
What you need to modify:
Broker.id (indicates the current server ID in the cluster, starting at 0); Port;host.name (current server host name); Zookeeper.connect (connected zookeeper cluster); log.dirs (Log storage directory, need to be created in advance).
Example:
4. Upload the configured Kafka to the other nodes
Scp-r Kafka node2:/usr/
Note that after uploading, do not forget to modify the configuration unique to each node such as Broker.id and Host.nam.
four. Start and Test Kafka
1. Start the zookeeper first, then use the command to start the Kafka, after the success of the message will be prompted.
./bin/kafka-server-start.sh Config/server.properties &
2. Test the Kafka. Create Topic,producer,consumer separately, preferably on a different node. Enter information on the console of the producer to see if the consumer console is able to receive it.
Create topic:
./bin/kafka-topics.sh-zookeeper node1:2181,node2:2181,node3:2181-topic Test-replication-factor 2-partitions 3- Create
View topic:
./bin/kafka-topics.sh-zookeeper Node1:2181,node2:2181,node3:2181-list
Create producer:
./bin/kafka-console-producer.sh-broker-list Node1:9092,node2:9092,node3:9092-topic Test
Create consumer:
./bin/kafka-console-consumer.sh-zookeeper Node1:2181,node2:2181,node3:2181-from-begining-topic Test
Test:
Enter the information in the console of the producer to see if the consumer console can receive it.
Producer
Consumer
After the above configuration and testing, Kafka has been initially deployed, the next can be configured according to the specific requirements and operational Kafka. Please refer to the official website documentation for more information on Kafka and more specific ways to use it. Https://cwiki.apache.org/confluence/display/KAFKA/Index
Kafka Cluster Deployment