Getting started with Kafka

Last Update:2018-10-28 Source: Internet

Author: User

Tags zookeeper client

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

What is Kafka?

Kafka is an open-source stream processing platform developed by the Apache Software Foundation and compiled by Scala and Java. Kafka is a high-throughput distributed publish/subscribe message system that can process all the action flow data of a website with a consumer scale.

Basic concepts of Kafka

BROKER: physical concept. Each Kafka node in a Kafka cluster;
Topic: logical concept, the category of Kafka messages, data differentiation and isolation;
Partition: The basic unit of data storage in Kafka. Data of one topic is distributed and stored in multiple partitions. Each partition is ordered;
Replication (copy and backup): The same partition may have multiple replica, and the data between multiple replica is the same;
Replication leader: on multiple replica of a partitionn, a leader is required to interact with producer and consumer on the partition;
Replicamanager: manages information about all the partitions and copies of the current broker, processes some requests initiated by kafkacontroller, switches the copy status, adds/reads messages, and election of leader.

Kafka concept Extension

Partition (minimum storage unit)

Each topic is divided into multiple partitions (partition is the basic unit of consumer storage );
The number of consumers is smaller than or equal to the number of partition (if multiple consumers consume the same partition, a data error occurs. All Kafka is designed in this way );
Each broker in the broker Group stores one or more partitions of a topic. (One broker only saves one partition. If the partition is too large, multiple brokers Save the same partition );
Only one consumer in the consumer group reads one or more partitions of a topic, and is the only consumer (to prevent the same partition from being consumed by multiple consumer ).

Replication

When a broker crashes in the cluster, the system can proactively provide services to replicas;
By default, the replication coefficient of each topic is set to 1 (that is, there is no copy by default, saving resources). You can set it separately when creating a topic.

Features:

The basic unit of replication is the partition of the topic;
All read and write operations are carried in the leader, and followers is used only as a backup (only the leader manages read and write operations, and other replication only supports backup );
Follower must be able to copy leader data in a timely manner;
Increase fault tolerance and scalability.

Basic Structure of Kafka

Kafka message structure

Kafka features

Distributed (Multi-partition, multi-copy, multi-consumer, based on zookeeper scheduling );
High Performance (high throughput, low latency, high concurrency, time complexity is O (1 ));
Persistence and scalability (data persistence, failover rate, online horizontal scaling, automatic message balancing ).

Kafka application scenarios

Message Queue, behavior tracking, metadata monitoring, log collection, stream processing, event source, persistent log, and so on.

Kafka installation (in Linux)

JDK and zookeeper must be installed.

Zookeeper installation:

1. Download, decompress, and configure:

Wget http://mirror.bit.edu.cn/apache/zookeeper/zookeeper-3.4.12/zookeeper-3.4.12.tar.gztar-zxvf zookeeper-3.4.12.tar.gz.
# Copy zoo_sample.cfg to zoo. cfg in zookeeper-3.4.12/Conf
CP zoo_sample.cfg zoo. cfg
# Modify the following two lines in the zoo. cfg file (the folder mentioned after datadir and datalogdir must exist. If it does not exist, an error is returned when the zookeeper server is started. This is the configuration of a single machine. If it is a cluster, add the Server IP address under the clientport. For example, server.1 = 192.168.180.132: 2888: 3888
Server.2 = 192.168.180.133: 2888: 3888... and so on .)
Datadir =/tmp/zookeeper
Datalogdir =/tmp/zookeeper/log

2. Configure environment variables (permanent change mode for all users ):

Modify the/etc/profile file and add it at the end:

ZOOKEEPER_INSTALL=/usr/local/zookeeper-3.4.12 PATH=$PATH:$ZOOKEEPER_INSTALL/bin  export ZOOKEEPER_INSTALLexport PATH

3. Start the test:

# Go To The bin directory of zookeeper and start. /zkserver. sh start # view the status. /zkserver. sh status # Start the zookeeper client (the-server parameter is not required locally ). /zkcli. sh-server 192.168.147.128: 2181

Note: If the connection is rejected, check the firewall configuration.

Getting started with Kafka

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More