Introduction to Kafka and establishment of Cluster Environment

Source: Internet
Author: User
Kafka concept: Kafka is a high-throughput streaming distributed message system used to process active stream data, such as webpage access views (PM) and logs. It can process big data in real time.
It can also be processed offline.
Features:
1. High Throughput 2. It is an explicit distributed system that assumes that data producers, brokers, and consumer are scattered across multiple machines. 3. Status information about which data has been used is saved as part of the data usage (consumer), rather than stored in the service autumn.

Basic knowledge about Queues: Message refers to the basic unit of communication. A message producer (producer) publishes a message about a topic, the message is physically sent
As a Broker Server, several consumers subscribe to a topic, and then the messages published by the producer are sent to all users.

Kafka is an explicit distributed system that allows producers, consumers, and proxies to run on different machines in a logical unit and in a coordinated cluster.

Consumer group: each consumer process belongs to a consumer group. Each message is sent to only one consumer process in the consumer group.
Machines are logically considered a consumer. The consumer group means that each message is sent to only one process in the consumer group, but the consumer process in the same group can use this message, therefore, no matter how many subscribers are in the consumer group, each piece of information is stored in the group!

In Kafka, the user (consumer) is responsible for maintaining the status (offset) of the message used, and saves the status data to zookeeper in Kafka,
When a hadoop job is loaded from Kafka in parallel, each mapper will store the status offset to HDFS before the map job ends. This mechanism can also be used to roll back data reading.

Distribution mechanism;
Kafka usually runs on servers in the cluster. There is no central "master" node. Proxies are equivalent to each other and can be added or deleted at any time without any manual configuration. Likewise, producers and consumers can enable this function at any time. Each proxy can register some metadata (for example, available topics) in the zookeeper (Distributed Coordination System ). Producers and consumers can use zookeeper to discover topics and coordinate with each other. Details about producers and consumers are described below.
Consumers and producers achieve Load Balancing by partitioning.
Topic: used to differentiate different types of data information partitions. Partition numbers start from 0, 2, 3... the leader is responsible for reading and writing data, and the follower is responsible for synchronizing data, high throughput, and load balancing producer to write data in different partitions. The same principle is true for consumer, in this way, the read/write load can be balanced to the consumer consumed in different partitions, And the consumed data can be read from the primary partition (Leader) to the consumer group: Shared consumption information, consumers in the same consumer group only need to read the same data once, because consumers in the same group share data.
######################################## ######################################## #########################
Install Kafka:

1. Upload kafka_2.9.2-0.8.1.1.tgz to server
2. single-node Kafka,. start the zookeeper cluster first to execute bin/kafka-server-start.sh config/server. properties will report: Unrecognized VM option 'usecompressedoops' error: cocould not create the Java Virtual Machine. error: a fatal exception has occurred. program will exit. the reason is that the JDK version does not match. You need to modify the configuration file: Remove this configuration-XX: + usecompressedoopsb. start a server bin/kafka-server-start.sh config/server. properties
C. view topic bin/kafka-topics.sh -- list -- zookeeper localhost: 2181 create topic bin/kafka-topics.sh -- create -- zookeeper localhost: 2181 -- replication-factor 1 -- partitions 1 -- Topic test view topic description bin/kafka-topics.sh -- describe -- zookeeper localhost: 2181 -- topic my-Replicated-topic
D. test to start a producer bin/kafka-console-producer.sh -- broker-list localhost: 9092 -- Topic test enable a consumer bin/kafka-console-consumer.sh -- zookeeper localhost: 2181 -- Topic test -- from-beginning3. cluster build modify configuration file Vim config/server-1.properties last zookeeper cluster configuration zookeeper. connect = storm01: 2181, storm02: 2181, storm03: 2181 then copy Kafka configurations to other servers SCP-r/usr/itcast/Kafka [email protected]: /usr/itcast/SCP-r/usr/itcast/Kafka [email protected]:/usr/itcast/modify the configuration file Vim config/server-1.properties broker. id = 1, Broker. id = 2 start the test




 

Introduction to Kafka and establishment of Cluster Environment

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.