Kafka of Log Collection

Last Update:2016-04-30 Source: Internet

Author: User

Tags benchmark

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Kafka of Log Collection

Http://www.jianshu.com/p/f78b773ddde5

First, Introduction

Kafka is a distributed, publish/subscribe-based messaging system. The main design objectives are as follows:

Provides message persistence in a time-complexity O (1) manner, guaranteeing constant-time complexity of access performance even for terabytes or more data
High throughput rates. Capable of single-machine support for transmission of messages up to 100K per second, even on very inexpensive commercial machines
Supports message partitioning between Kafka servers, and distributed consumption, while guaranteeing the sequential transmission of messages within each partition
Support offline data processing and real-time data processing
Scale out: Supports online horizontal scaling

Architecture and principles

Kafka architecture and principles presumably everyone has seen in a lot of places, not today, the next time to discuss the details of the entire Kafka of the specific workflow and architecture such as:

Overall architecture

As shown, a typical Kafka cluster contains several producer (which can be page View generated by the Web front end, or server logs, System CPUs, memory, etc.), and several brokers (Kafka support horizontal expansion, the more general broker number, The higher the cluster throughput, several consumer Group, and one zookeeper cluster. Kafka manages the cluster configuration through zookeeper, elects leader, and rebalance when the consumer group is changed. Producer uses push mode to publish messages to Broker,consumer to subscribe to and consume messages from broker using pull mode.

Second, installation

Installing Kafka on CentOS, I recommend installing the Confluent Company's Kafka kit, we can choose the components we want.

2.1 Yum Installation

sudo rpm --import http://packages.confluent.io/rpm/2.0/archive.key
Add Yum Source,confluent.repo

[confluent-2.0]
Name=confluent repository for 2.0.x packages
baseurl=http://packages.confluent.io/rpm/2.0
Gpgcheck=1
Gpgkey=http://packages.confluent.io/rpm/2.0/archive.keyenabled=1
sudo yum install confluent-platform-2.11.7Can be installed, containing confluent-kafka-2.11.7 and confluent-schema-registry other components inside.

Start quickly as soon as the installation is complete.

Three, Kafka command line

After the Kafka tool is installed, there will be a lot of tools to test Kafka, here are a few examples

3.1 Kafka-topics

Create, change, show all and describe topics, examples:

 [root@localhost ~]#/usr/bin/kafka-topics --zookeeper zk01.example.com:2181 --listsink1test[root@localhost ~]#/usr/bin/kafka-topics --zookeeper zk01.example.com:2181 --create --topic

3.2 Kafka-console-consumer

Read data from Kafka, output to console

[[email protected] ~]#kafka-console-consumer --zookeeper zk01.example.com:2181 --topic test

3.3 Kafka-console-producer

Reads data from standard output and writes to the Kafka queue

[[email protected] ~]#/usr/bin/kafka-console-producer --broker-list kafka02.example.com:9092,kafka03.example.com:9092 --topic test2

3.4 Kafka-consumer-offset-checker

Check the amount of read and write messages

[root@localhost ~]#/usr/bin/kafka-consumer-offset-checker --group flume --topic test1 --zookeeper zk01.example.com:2181

Iv. Kafka Web UI

Use open source projects Kafkaoffsetmonitor or Kafka-manager to visualize Kafka situations.

4.1 Running Kafkaoffsetmonitor

Download the jar package, Kafkaoffsetmonitor-assembly-0.2.1.jar.

execute command to run

java-cp/root/kafka_web/kafkaoffsetmonitor-assembly-0.2.1.jar com.quantifind.kafka.offsetapp.OffsetGetterWeb--dbname Kafka--zk zk-server1,zk-server2--port 8080--refresh 10. Seconds--retain 2.days

With supervisor run, create the kafka_web.conf file under the/etc/supervisord.d directory, as follows

[program:kafka_web]command=java -cp /root/kafka_web/KafkaOffsetMonitor-assembly-0.2.1.jar com.quantifind.kafka.offsetapp.OffsetGetterWeb  --dbName kafka -zk  zk-server1,zk-server2 --port 8080 --refresh 10.seconds --retain 2.daysstartsecs=0stopwaitsecs=0autostart=trueautorestart=true

4.2 Running Kafka-manager

Running Kafka-manager requires SBT compilation, but it's too cumbersome to compile, and it doesn't have to be successful, so I ran one directly with Docker.

On CentOS, /etc/sysconfig/docker add daocloud acceleration mirror on Docker configuration and modify Docker run parameters:

other_args=" --registry-mirror=http://7919bcde.m.daocloud.io --insecure-registry=0.0.0.0:5000 -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock -api-enable-cors=true"

Restart Docker directly.

Run the Docker command to access

  9000:9000 -e ZK_HOSTS="zk_host:2181" -e APPLICATION_SECRET=kafka-manager sheepkiller/kafka-manager

Five, performance testing

About the performance test, found the founder of Kafka Jay Kreps's Bechmark. The following descriptions are based on the benchmark. (The benchmark is based on Kafka0.8.1)

5.1 Test environment

The benchmark used six machines, and the machine was configured as follows

Intel Xeon 2.5 GHz processor with six cores
Six 7200 RPM SATA Drives
32GB of RAM
1Gb Ethernet

Of these 6 machines, 3 were used to build the Kafka broker cluster, and 3 were used to install zookeeper and generate test data. 6 drive are mounted directly on a non-raid basis. In fact, Kafka's demand for machines is similar to that of Hadoop.

5.2 Producer Throughput Rate

The test only measured producer throughput, which means that the data is only persisted and no consumer read data.

1 producer threads, no replication
in this test, a topic with 6 partition and no replication was created. It then generates a short (payload100 bytes long) message with a thread as fast as possible with a million.　　The test result is 821,557 records/second ( 78.3mb/second ). The
uses short messages because it is more difficult for the messaging system to use the scenario.　　Because if you use Mb/second to characterize throughput rates, sending long messages can undoubtedly make the test results better.
Throughout the test, the number of messages per second delivery multiplied by the length of payload to calculate the Mb/second, not the message's meta-information, so the actual network usage will be larger than this. For this test, an additional 22 bytes will be transferred each time, including an optional key, message length description, CRC, etc. In addition, there are some requests related to overhead, such as topic,partition,acknowledgement. This leads us to be more difficult to determine whether the network adapter limit has been reached, but it should be more reasonable to calculate the throughput rate of these overhead.　　Therefore, we have basically reached the limit of the network card.
Preliminary observations of this result will be considered to be much higher than expected, especially when considering the Kafka to persist data to disk. In fact, if you use a random access data system, such as an RDBMS, or Key-velue store, you can expect the highest frequency of access from 5000 to 50,000 requests per second, which is about the same amount of remote requests that a good RPC layer can accept. There are two reasons for this test.
Kafka ensures that the process of writing disks is linear disk I/O, and that the maximum throughput for the 6 low-cost disk linear I/O used in the test is 822mb/second, which is much larger than the throughput of the 1Gb NIC. Many messaging systems persist data to disk as a costly matter because their operations on the disk are not linear I/O.
at each stage, Kafka uses bulk processing whenever possible. If you want to understand the importance of batch processing in I/O operations, you can refer to David Patterson's "Latency Lags Bandwidth"
1 x Producer threads, 3 asynchronous replication
The test is basically the same as the previous test, with the only difference being that each partition has 3 replica (so the total amount of data transmitted by the network and written to disk increases by 3 times times). Each broker is written as a leader partition, also read (read data from leader) write (writes data to disk) as follower partition. The test result is 75.1mb/second.
The test replication is asynchronous, meaning that the broker receives the data and writes it to the local disk acknowledge producer, without having to wait for all replica to finish replication. In other words, if leader crash, some of the latest data that has not been backed up may be discarded. But it also makes message acknowledgement less latency and better in real time.
This test shows that replication can be quick. The entire cluster can write as much as 3 times times replication and only One-third, but the throughput rate is still good enough for every producer.
1 x Producer threads, 3 simultaneous replication
The only difference between this test and the previous test is that the replication is synchronous, and each message is only in the All replica in the Sync collection are copied before they are set to committed (this time the broker sends acknowledgement to producer).
In this mode, Kafka can guarantee that even if the leader crash, there will be no data loss. Test results 40.2mb/second. Kafka synchronous replication is not fundamentally different from asynchronous replication. Leader will always track follower replica to monitor whether they are still alive, and only the replica of all in sync collections can be consumed by acknowledge. And the wait for follower affected the throughput rate. This can be improved by increasing batch size, but this test does not make this adjustment in order to avoid specific optimizations that affect the comparability of test results.
3 x producer,3 Asynchronous replication
This test is equivalent to the 1 producer above, copied to 3 different machines (on 1 machines running multiple instances of the increase in throughput will not be much help, because the network card is basically saturated), these 3 producer simultaneously send data. The throughput rate for the entire cluster is 193,0mb/second.

5.3 Producer throughput Vs. Stored Data

A potential danger to the messaging system is that it performs well when the data is stored in memory, but when the amount of data is too large to be fully in memory (and many messaging systems delete the data that has already been consumed, but when consumption is slower than production, the data accumulates), and the data is transferred to the disk. This reduces the throughput rate, which in turn causes the system to not receive data in a timely manner. This is very bad, but in many scenarios the goal of using a queue is to solve the problem of inconsistent data consumption speed and production speed.
However, Kafka does not exist because Kafka always persists the data to disk with the time complexity of O (1), so its throughput rate is not affected by the amount of data stored on disk. In order to verify this feature, a large amount of data has been tested for a long time. Tests show that when the amount of disk data reaches 1TB, the throughput and disk data is only hundreds of MB without significant difference, this variance is caused by Linux I/O management, it will cache the data and then batch flush.

5.4 Consumer throughput rate

It is important to note that the replication factor does not affect the consumer throughput test, because consumer only reads the data from the partition of each leader, regardless of Replicaiton factor. Similarly, the consumer throughput rate is independent of synchronous replication or asynchronous replication.
1 x Consumer
The test consumed a message from a topic of 6 partition,3 replication. million. The test result is 89.7mb/second. As you can see, Kafka's consumer is very efficient. It reads the file blocks directly from the broker's file system. Kafka uses the Sendfile API to directly transfer directly through the operating system without having to copy the data to the user space. The test actually reads the data from the beginning of the log, so it does the actual I/O. In a production environment, consumer can directly read the data that producer just wrote (it may still be in the cache). In fact, if I am running I/O stat in a production environment, you can see basically no physical "read". In other words, the throughput rate of consumer in the production environment is higher than that of the test.
3 x Consumer
Copy the above consumer to 3 different machines and run them in parallel (consuming data from the same topic). The test results are 249.5mb/second, as expected, and the throughput rate of the consumer is almost linearly increased.

5.5 Producer and Consumer

The above test only tests the producer and consumer separately, and the test runs both producer and consumer, which is closer to the usage scenario. In fact, the current replication system follower equivalent to consumer at work.
The test uses 1 producer and a consumer on topic with 6 partition and 3 replica, and uses asynchronous replication. The test result is 75.8mb/second, and you can see that the test result is almost identical to the result of testing 1 producer alone. So consumer is very lightweight.

5.6 Effect of message length on throughput rate

All of the above tests are based on short messages (payload 100 bytes), and as mentioned above, short messages are more difficult to use for Kafka, and it can be expected that records/second will decrease as the message length increases, but Mb/second will improve. As we expected, as the message length increases, the number of messages that can be sent per second is decreasing. However, if you look at the total size of messages sent per second, it increases with the length of the message, and when the message length is 10 bytes, the CPU becomes a bottleneck and does not take full advantage of the bandwidth because of the frequent queue and the time it takes to acquire the lock. But starting with 100 bytes, we can see that bandwidth usage is getting saturated (although mb/second increases with the length of the message, but the increase is smaller).

5.7 End-to-end latency

The throughput rate discussed above, what about the latency of the message transmission? So how much time does it take to get messages from producer to consumer? The test creates 1 producer and one consumer and repeats the timing. As a result,2 ms (median), 3ms (99th percentile, 14ms (99.9th percentile), (there is no description of how many partition topic have, nor how many replica Whether the replication is synchronous or asynchronous. In fact, this can greatly affect the message sent by producer latency, and only committed messages can be consumed by consumer, so it will eventually affect end-to-end latency)

5.8 Reproduce the benchmark

If the reader wants to reproduce the benchmark test on his or her machine, refer to the configuration of this test and the commands used.
In fact, Kafka distribution provides a producer performance testing tool that can be bin/kafka-producer-perf-test.sh started by scripting.
Readers can also refer to another Kafka performance test report

Wen/modeyangg_cs (author of Jane's book)
Original link: http://www.jianshu.com/p/f78b773ddde5
Copyright belongs to the author, please contact the author to obtain authorization, and Mark "book author".

Kafka of Log Collection

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More