Kafka File System Design

Last Update:2014-10-28 Source: Internet

Author: User

Tags crc32 disk usage

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. File System Description

File systems are generally divided into two types: system and user. System-level file systems: ext3, ext4, DFS, NTFS, etc ,, I will not introduce the complicated distributed or system-level file system,

The architecture design of the Kafka file system is deeply analyzed from the perspective of the high performance of the Kafka architecture.

2. Kafka File System Architecture 2.1 File System Data Flow

The following figure shows how to process the client:

Figure 1

When a connection request is established, the client first sends a connection request to the Kafka broker. After the connection is received and established by the acceptor thread in the broker, the client socket is forwarded to the corresponding processor thread in polling mode.
When the client sends a data request to the broker, the processor thread processes and receives the client data and puts the data in the request buffer for the IO thread to perform logic processing and calculation, and puts the returned result in the Response Buffer.
Then wake up the processor thread, and the processor thread embraces the response queue to send all the response data to the client cyclically.

2.2 Kafka File System Storage Structure

Figure 2

Paritions distribution rules. A Kafka cluster consists of multiple Kafka brokers. the partitions of a topic are distributed on one or more brokers. the partitions of a topic are allocated on the Kafka cluster as follows, install the paritions index numbers and distribute them in sequence on the broker,
When the number of partitions is greater than the number of brokers, the distribution will be iterated in turn.
Partitions naming rules. paritions names: Topic-name-index, index partition index number, which increases progressively from 0.
Producer, each producer can send MSG to any one or more partitons of the topic.
Consumer, consumers in the same consumer group. Kafka sends each message in the corresponding topic to only one consumer.

2.3 Kafka File System Structure-directory

Currently, assume that there is only one broker in the Kafka cluster and the data file directory is message-Folder. For example, the author creates a topic named report_push and partitions = 4.

The storage path and directory rules are as follows:

Xxx/message-Folder

| -- Report_push-0

| -- Report_push-1

| -- Report_push-2

| -- Report_push-3

The image is shown as follows:

Figure 3

2.4 Kafka File System Structure-partiton file storage method

Figure 4

How does one store a large number of MSG messages in each partition (topic-name-index) directory? What is the file storage structure?

Are so many (massive) messages stored in a large file, stored like dB, or other storage structures? In the future, I will give you a layer-by-layer decomposition and analysis, just like peeling onions.

Compared with the Kafka file system, we believe that you have used databases. The underlying File System of the database is quite complex. Due to the characteristics of the database, you need to quickly query, modify, delete, and log by keyword and ID, rollback.
Therefore, the database file system is a tree structure of paging storage, which requires support for a large number of random transaction operations. Compared with complex files such as queries and transactions supported by databases, the Kafka Message Queue file system is much simpler,
It is enough to support the ordered production and message of producer and consumer. The life cycle of the message (MSG) is determined by the consumer.
For partiton file storage structure analysis, each partition is like 4. A large file message data is evenly distributed to multiple files of the same size. That is, it is equivalent to a large file that is cut into many file segments of equal size. segment File
(The number of messages is not necessarily equal ). The life cycle of a message in each topic is determined by the last consumer. When a message or message is sent by the last consumer group, the message can be deleted. Obviously easy to see,

In this way, the broker can quickly reclaim disk space and MMAP all small files to the memory. The main purpose is to improve disk utilization and message processing performance.

2.5 structure of Kafka's file system-Composition of partiton file storage segment File

The reader learns about the partition storage method of the Kafka file system from section 2.4. Next, we will introduce the segement file structure in partion file storage. The performance of a commercial message queue is good or bad,

Its file system storage structure design is one of the most critical indicators for measuring a Message Queue Service program. It is also the core of the message queue and the part that best reflects the technical level of the message queue. In this section, we will go to the inside of the segment file to find out.

Segment file: consists of two parts: Segment data file and segment index file. These two files correspond one to one and appear in pairs.

The Structure of the segment index file is as follows:

00000000000000000000. index file name. The maximum file string size is 2 ^ 64bit.

The relative number of records in the corresponding log file and the location of the physical offset each time, a total of 8 bytes
4ByteCurrent segment file offset-last seg file offset number of records offset
4BytePhysical offset position of the corresponding segment File
.........

The Structure of the segment data file index file is as follows:

00000000000000000000. LogFile name. The maximum file string size is 2 ^ 64bit, which corresponds to the index.

Figure 5

Parameter description:

4 byte CRC32: Use the CRC32 algorithm to calculate the buffer except the 4byte CRC32.

1 byte "magic": indicates the Protocol version number of the data file.

1 byte "attributes": identifies an independent version, the compression type, and the encoding type.

Key data: Optional. It can store metadata that identifies or represents the Message Block.

Payload data: Message Body, which may store multiple message records and is stored in sequence by serial number.

2.6 Kafka File System-consumer read Process

Figure 6

Segment index file:

The sparse index method reduces the size of the index file, allowing direct memory operations. The sparse index only sets one key-pointer pair for each storage block of the data file, it saves more storage space than the dense index, but it takes more time to search for records of the given value. You can quickly find the physical location of the segment data file through binary search, if the specific location of the data file is not found in index file, the relative location of the data file continues to be read and searched until it is found.

2.7 structure of the Kafka File System-general directory structure

Figure 7

The same topic has different partitions. Each partition is divided into multiple (segment) files, with only one current file being written and other files being read-only. When a file is fully written (the value reaches the set value), switch the file and create a new file for writing. The old file is read-only. The file name is named at the starting offset. Let's look at an example. Assume that the 0-0 partition under the topic report_push may have the following files:

? 00000000000000000000. Index

? 00000000000000000000. Log

? 00000000000000368769. Index

? 00000000000000368769. Log

? 00000000000000737337. Index

? 00000000000000737337. Log

? 00000000000001105814. Index

? 00000000000001105814. Log

....................

00000000000000000000. index indicates the initial file. The start offset is 0. the second file 00000000000000368769. the START offset of the index message volume is 368769. similarly, the third file 00000000000000737337. the starting offset of index is 737337.

Naming and sorting these files with the starting offset makes it quite easy for the consumer to pull the data at the starting offset of a message, you only need to find the file list based on the uploaded offset ** Binary Search ** and locate the specific file,

Then, convert the absolute offset minus the Start Node of the file to the relative offset to start data transmission. For example, if the consumer wants to capture data starting from the position of the 368,969th message, the query is performed based on the 368969 binary query,

Locate the 00000000000000368769. Log File (368969 is between 368769 and 737337). The maximum size of data read can be determined based on the binary search of the index file.

2.8 Kafka File System-actual results

Figure 8

There are basically no disk read operations in the memory, and only regular disk write operations are performed in batches.

3. Summary

Efficient file system features

A large file is divided into multiple small file segments.
Multiple small file segments make it easy to regularly clear or delete the files that have been consumed, reducing disk usage.
All indexes are mapped to memory for direct operations to prevent the segment file from being exchanged to the disk to increase the number of Io operations.
Based on the index information, you can determine the maximum size from response to consumer.
Index file metadata storage uses offset storage relative to the previous segment file to save space.

Kafka File System Design

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More