Introduction to Kafka and establishment of Cluster Environment

Last Update:2014-09-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Kafka concept: Kafka is a high-throughput streaming distributed message system used to process active stream data, such as webpage access views (PM) and logs. It can process big data in real time.
It can also be processed offline.
Features:
1. High Throughput 2. It is an explicit distributed system that assumes that data producers, brokers, and consumer are scattered across multiple machines. 3. Status information about which data has been used is saved as part of the data usage (consumer), rather than stored in the service autumn.

Basic knowledge about Queues: Message refers to the basic unit of communication. A message producer (producer) publishes a message about a topic, the message is physically sent
As a Broker Server, several consumers subscribe to a topic, and then the messages published by the producer are sent to all users.

Kafka is an explicit distributed system that allows producers, consumers, and proxies to run on different machines in a logical unit and in a coordinated cluster.

Consumer group: each consumer process belongs to a consumer group. Each message is sent to only one consumer process in the consumer group.
Machines are logically considered a consumer. The consumer group means that each message is sent to only one process in the consumer group, but the consumer process in the same group can use this message, therefore, no matter how many subscribers are in the consumer group, each piece of information is stored in the group!

In Kafka, the user (consumer) is responsible for maintaining the status (offset) of the message used, and saves the status data to zookeeper in Kafka,
When a hadoop job is loaded from Kafka in parallel, each mapper will store the status offset to HDFS before the map job ends. This mechanism can also be used to roll back data reading.

Distribution mechanism;
Kafka usually runs on servers in the cluster. There is no central "master" node. Proxies are equivalent to each other and can be added or deleted at any time without any manual configuration. Likewise, producers and consumers can enable this function at any time. Each proxy can register some metadata (for example, available topics) in the zookeeper (Distributed Coordination System ). Producers and consumers can use zookeeper to discover topics and coordinate with each other. Details about producers and consumers are described below.
Consumers and producers achieve Load Balancing by partitioning.
Topic: used to differentiate different types of data information partitions. Partition numbers start from 0, 2, 3... the leader is responsible for reading and writing data, and the follower is responsible for synchronizing data, high throughput, and load balancing producer to write data in different partitions. The same principle is true for consumer, in this way, the read/write load can be balanced to the consumer consumed in different partitions, And the consumed data can be read from the primary partition (Leader) to the consumer group: Shared consumption information, consumers in the same consumer group only need to read the same data once, because consumers in the same group share data.
######################################## ######################################## #########################
Install Kafka:

1. Upload kafka_2.9.2-0.8.1.1.tgz to server
2. single-node Kafka,. start the zookeeper cluster first to execute bin/kafka-server-start.sh config/server. properties will report: Unrecognized VM option 'usecompressedoops' error: cocould not create the Java Virtual Machine. error: a fatal exception has occurred. program will exit. the reason is that the JDK version does not match. You need to modify the configuration file: Remove this configuration-XX: + usecompressedoopsb. start a server bin/kafka-server-start.sh config/server. properties
C. view topic bin/kafka-topics.sh -- list -- zookeeper localhost: 2181 create topic bin/kafka-topics.sh -- create -- zookeeper localhost: 2181 -- replication-factor 1 -- partitions 1 -- Topic test view topic description bin/kafka-topics.sh -- describe -- zookeeper localhost: 2181 -- topic my-Replicated-topic
D. test to start a producer bin/kafka-console-producer.sh -- broker-list localhost: 9092 -- Topic test enable a consumer bin/kafka-console-consumer.sh -- zookeeper localhost: 2181 -- Topic test -- from-beginning3. cluster build modify configuration file Vim config/server-1.properties last zookeeper cluster configuration zookeeper. connect = storm01: 2181, storm02: 2181, storm03: 2181 then copy Kafka configurations to other servers SCP-r/usr/itcast/Kafka [email protected]: /usr/itcast/SCP-r/usr/itcast/Kafka [email protected]:/usr/itcast/modify the configuration file Vim config/server-1.properties broker. id = 1, Broker. id = 2 start the test

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Introduction to Kafka and establishment of Cluster Environment

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Introduction to Kafka and establishment of Cluster Environment

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support