Preliminary discussion + Pseudo-cluster deployment

Last Update:2016-08-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Kafka Introduction + pseudo-cluster deployment

Kafka is a very popular Message Queuing middleware, which is commonly used for regular Message Queuing, active data analysis of Web sites (PV, traffic, clicks, etc.), log collection (docking Big data storage engine offline analysis).

The whole content comes from the network, the credibility remains to be verified! If you have any questions, please correct them in time. Concept Introduction

In Kafka, Message Queuing is divided into three roles:

producer, i.e. producer, responsible for generating log data.
broker, a storage node that is responsible for topic partition partitioning, evenly distributed storage partitions.
consumer, which is the consumer, is responsible for reading broker the partition in use.

Producer

A producer in a Kafka system that generates data and sends it to the broker for storage. Because the socket connection needs to be maintained with the partition in the broker, the correspondence between the producer and the partition broker needs to be maintained in ZK. The data under the same topic is sent to different partitions in a load-balanced manner.

Broker

The broker can be used as a storage node in Kafka, and the data is assigned to different partitions according to the topic organization, in a load-balanced manner. A topic consists of multiple partitions, each of which can set the number of backups. The partition consists of a leader+ multiple followers, the producer communicates directly with the leader, and leader receives the message, the other followers synchronizes the message. After all the follwers synchronization messages, the message becomes a consumable state.

The topic and partition in broker, partition and producer, the election backup between partition and so on all need the ZK to coordinate.

Consumer

Consumer are consumers in Kafka, usually in the form of groups, and a group contains multiple consumer. Each group corresponds to a topic, the topic within the partition can only correspond to a consumer, that is, if the consumer is a lot of circumstances, there will be some consumers do not consume data, if the consumer is very rare, there will be consumers simultaneously consume multiple partitions of data.

Kafka only guarantees that the consumption of a partition's messages is orderly, and that multiple partitions are not guaranteed to be orderly.

To ensure the reliability of data consumption, Kakka provides several mechanisms for consumption:

1 at the most once, that is, after the consumption data, save offset, you can no longer access this data.
2 at least once, that is, after consumption data, save offset, if the save error, the next time may also fetch the data
3 exactly once, pending review

In Kafka, offset is maintained by consumer (which can actually be done by ZK). Such a mechanism has two benefits,

One is the ability to consume data according to consumer, to avoid the pressure of producing consumption data;
The other is the number of data that can be customized for fetch consumption, can read 1 at a time, and can read 100 strips 1 times.

Topic

The subject of the data in Kafka, all operations (such as message storage and reading \ Consumption) are done according to topic.

Partition

Each topic is composed of multiple partitions, and the data inside each partition is guaranteed to be ordered, that is, by time series, append to the tail of the partition. The partition is a fixed size, and when the capacity is insufficient, a new partition is created. Kafka regularly cleans up outdated files for a certain period of time.

This continuous file storage, on the one hand, effectively utilizes the linear access of the disk, on the other hand, reduces the memory pressure.

Zookeeper

In Kafka, the scheduling of many nodes and the allocation of resources depend on the zookeeper to complete.
Such as:

1 Broker's registration, save broker's IP and port;
2 topic registration, Managing topic partition and distribution in broker
3 Broker Load Balancing, speaking topic dynamically assigned to broker, through topic distribution and broker load judgment
4 consumers, each partition of the message sent only to a consumer (do not know what the relationship with zookeeper)
5 consumer-to-partition correspondence, stored in ZK
6 Consumer load balancing, once consumers increase or decrease, will trigger the load balance of consumers
7 The Offset,high level of the consumer maintains the offset information by ZK;

Construction of pseudo-cluster environment

Deploying a pseudo-clustered environment, which is a single-node environment, is straightforward. Download the deployment file, unzip it, and run it directly.

Run the command as follows:

# 启动zookeeperbin/zookeeper-server-start.sh config/zookeeper.properties &  # 启动kafkabin/kafka-server-start.sh config/server.properties &

If you want to test, you can start the test program:

# 启动生产者测试程序./kafka-console-producer.sh --broker-list localhost:9092 --topic test# 启动消费者测试程序./kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning

The content entered in the producer interface can be seen directly in the consumer interface.

Category: Kafka

Preliminary discussion + Pseudo-cluster deployment

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Preliminary discussion + Pseudo-cluster deployment

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Preliminary discussion + Pseudo-cluster deployment

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support