Topic Center

Contact Sales

Home > Others

Performance optimization of roaming Kafka design articles

Last Update:2018-08-04 Source: Internet

Author: User

Tags file copy sendfile socket

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Original address: http://blog.csdn.net/honglei915/article/details/37564757

Kafka Video tutorial Sync Starter, welcome to watch.

Kafka has made great efforts in improving efficiency. One of the main usage scenarios for Kafka is to process the site activity log, which is very large, and each page produces several writes. Reading, assuming that each message is consumed only once, the amount of reading is also very large, Kafka also try to make the read operation more lightweight. We discussed disk performance issues before, and there are about two things that affect disk performance in the case of linear read and write: Too many trivial I/O operations and too many byte copies. The I/O problem occurs between the client and the server, and also in persistent operations inside the server.
Messaging Set (message set)
To avoid these problems, Kafka established the concept of message set, which organizes messages together as units of processing. Processing messages in a message set is more performance-enhancing than handling them in a single message unit. Producer sends a piece of message to the server, rather than to the delivery of a strip; The server appends the message set to the log file one time, which reduces trivial I/O operations. Consumer can also request a message set at once.
Another performance optimization is in the byte copy aspect. This is not a problem in low-load situations, but it is still very large in the case of high loads. To avoid this problem, Kafka uses the standard binary message format, which can be shared between Producer,broker and producer without any changes.
Zero Copy
The message log maintained by the broker is simply a directory file, and the message set is written to the log file in a fixed team format, producer and consumer are shared, which makes Kafka an important point to optimize: the delivery of messages over the network. Modern UNIX operating systems provide high-performance system functions that send data from the page cache to the socket, which is sendfile in Linux.
To better understand the benefits of Sendfile, let's look at the data flow that typically sends data from a file to a socket:
The operating system writes the data from the page cache in the file copy kernel to the application from the page cache to the application that copies the data in its own memory cache to the kernel in the socket cache the operating system copies data from the socket cache to the NIC interface cache, where it is sent to the network.
This is obviously inefficient, with 4 copies and 2 system calls. Sendfile the network card interface cache by directly sending data from the page cache, avoiding duplicate copies and greatly optimizing performance.
In a multi-consumers scenario, the data is copied only once to the page cache once instead of every time the message is consumed. This allows messages to be sent at a rate of near-network bandwidth. At the disk level you can hardly see any read operations, because the data is sent directly to the network from the page cache.
This article describes in detail the application of Sendfile and Zero-copy technology in Java.
Data Compression
Many times, the bottleneck of performance is not CPU or hard disk but network bandwidth, especially for applications that need to transfer large amounts of data between data centers. Of course, users can compress their own messages without Kafka support, but this will result in a lower compression ratio, since compressing a large number of files together can be the best way to compress compared to a separate message.
Kafka uses end-to-end compression: Because there is a "message set" concept, the client's message can be compressed together and sent to the server, and in a compressed format to write to the log file, in a compressed format sent to the consumer, the message from producer sent to consumer get are compressed is decompressed only when consumer is used, so it is called "end-to-end compression."
Kafka supports GZIP and snappy compression protocols. More detailed information can be seen here.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

drupal performance optimization kafka vs rabbitmq performance kafka consumer performance tuning website speed and performance optimization verizon cloud performance optimization initializing java performance tuning and optimization java performance tuning and optimization

OpenGL Series Tutorial Eight: OpenGL vertex buffer Object (VBO) 07-26

Methods for generating various waveform files Vcd,vpd,shm,fsdb 02-11

Mac Ping:sendto:Host is down Ping does not pass other people'... 09-01

Solution to the problem that WordPress cannot be opened after... 12-05

(SOLR is successfully installed on the office machine accordi... 12-07

Webmaster resources (site creation required) 12-07

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Performance optimization of roaming Kafka design articles

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support