Introduction to Kafka + deployment of pseudo Clusters

Last Update:2016-08-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Kafka is a popular message queue middleware. It is often used for analyzing common message queues and website activity data (PV, traffic, clicks, etc), log collection (connected to the big data storage engine for offline analysis ).

All content comes from the network, and the credibility needs to be verified! If you have any questions, please correct them in time. Concepts

In Kafka, message queues are divided into three roles:

producerThe producer is responsible for generating log data.
broker, Storage node, responsible for followingtopicInpartitionPartitions: evenly distributed storage partitions.
consumerThat is, the consumer is responsible for reading and usingbroker.

Producer

The producer in the Kafka system is used to generate data and send it to the broker for storage. To maintain socket connection with the partition in the broker, you must maintain the ing between the producer and the partition broker in zk. Data under the same topic is sent to different partitions in a load balancing mode.

Broker

The Broker can be used as a storage node in Kafka. Data is organized by topic and distributed to different partitions in a certain load balancing mode. A Topic consists of multiple partitions. You can set the number of backups for each partition. A partition consists of one leader and multiple followers. The producer communicates with the leader directly. After the leader receives the message, other followers synchronize the message. After all follwers synchronize a message, the message becomes consumable.

In the Broker, information such as topics and partitions, partitions and producers, and election backups between partitions must be coordinated by ZK.

Consumer

Consumer is a Consumer in Kafka and usually exists as a Group. A Group contains multiple consumers. Each group corresponds to one Topic, and the partitions in the Topic can only correspond to one consumer. That is, if there are many consumers, some consumers cannot consume data. If there are few consumers, A consumer consumes data from multiple shards at the same time.

Kafka only ensures that messages in one partition are consumed in an orderly manner, and that messages in multiple partitions are not ordered.

To ensure the reliability of data consumption, Kakka provides several consumption mechanisms:

1 at most once, that is, after the data is consumed, the offset is saved and the data cannot be obtained again.
2 at least once, that is, after the data is consumed, the offset is saved. If an error occurs, the data may be retrieved next time.
3 exactly once, to be viewed

In Kafka, offset is maintained by consumer (it can be done by zk ). This mechanism has two advantages,

One is to consume data based on the consumer capability to avoid the pressure on consumption data;
The other is that you can customize the number of data consumed by fetch. You can read one data entry at a time or 100 data entries at a time.

Topic

All operations (such as message storage, reading, and consumption) on the Data topic in Kafka are completed based on the topic.

Partition

Each Topic is composed of multiple partitions. The data in each partition ensures the order, that is, by time sequence, append to the end of the partition. A partition has a fixed size. When the capacity is insufficient, a new partition is created. Kafka regularly cleans expired files within a certain period of time.

This continuous file storage, on the one hand, effectively utilizes the linear access to the disk; on the other hand, reduces the memory pressure.

Zookeeper

In Kafka, the scheduling and resource allocation of many nodes depend on zookeeper.
For example:

1. register the Broker and save the Broker's IP address and port;
2. Register a Topic to manage the partitions and distribution of topics in the broker.
3. Server Load balancer for brokers: dynamically distributes topics to brokers and determines the distribution of topics and the load of brokers.
4. Consumers. Messages in each partition are sent to only one consumer (I do not know what the relationship is with zookeeper)
5. The ing between consumers and shards is stored in zk.
6. Consumer load balancing: Once a consumer increases or decreases, the load balancing of the consumer will be triggered.
7. The offset information of the consumer is maintained by zk in the High level, and the offset information is maintained by the consumer in the Low Level.

Build a pseudo Cluster Environment

Deploy the pseudo cluster environment, that is, the single-node environment is very simple. Download the deployment file, decompress it, and run it directly.

Run the following command:

# Start zookeeperbin/zookeeper-server-start.sh config/zookeeper. properties & # Start kafkabin/kafka-server-start.sh config/server. properties &

If you want to test, you can start the test program:

# Start producer test program./kafka-console-producer.sh -- broker-list localhost: 9092 -- topic test # Start consumer test program./kafka-console-consumer.sh -- zookeeper localhost: 2181 -- topic test -- from-beginning

The content entered on the producer interface can be seen directly on the consumer interface.

Kafka architecture design of the distributed publish/subscribe message system

Apache Kafka code example

Apache Kafka tutorial notes

Principles and features of Apache kafka (0.8 V)

Kafka deployment and code instance

Introduction to Kafka and establishment of Cluster Environment

This article permanently updates the link address:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Introduction to Kafka + deployment of pseudo Clusters

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support