Kafka file storage mechanism those things __big

Source: Internet
Author: User
Tags crc32 message queue
What's Kafka?

Kafka, originally developed by LinkedIn, is a distributed, partitioned, multiple-copy, multiple-subscriber, zookeeper-coordinated distributed logging system (also known as an MQ system), commonly used for Web/nginx logs, access logs, messaging services, and so on, LinkedIn contributed to the Apache Foundation in 2010 and became the top open source project. 1. Foreword

A commercial message queue performance is good or bad, and its file storage mechanism design is to measure a Message Queuing service level and one of the most critical indicators.
From the Kafka file storage mechanism and the physical structure angle, we will analyze how Kafka can achieve efficient file storage and its practical effect. 2.Kafka file storage mechanism

Kafka some nouns are explained as follows: Broker: Message middleware handles nodes, a Kafka node is a broker, and multiple broker can form a Kafka cluster. Topic: A type of message, such as Page view log, click Log, can exist in the form of Topic, Kafka cluster can be responsible for multiple Topic distribution at the same time. Partition:topic physical groupings, a topic can be divided into multiple Partition, and each Partition is an ordered queue. The segment:partition is physically composed of multiple Segment, which are described in detail in 2.2 and 2.3 below. Offset: Each partition consists of a sequence of sequential, immutable messages that are appended sequentially to the partition. Each message in the partition has a sequential serial number called offset, which is used to partition uniquely identify a message.

The analysis process is divided into the following 4 steps: Topic partition storage distribution Partiton file storage mode Partiton segment file storage structure how to find message through offset in partition

Through detailed analysis of the above 4 processes, we can clearly understand the mysteries of the Kafka file storage mechanism. partition Storage distribution in 2.1 topic

Suppose the Kafka cluster in the experimental environment has only one broker,xxx/message-folder for the data file storage root directory, server.properties file configuration in Kafka broker (parameter log.dirs=xxx/ Message-folder), for example, create 2 topic names of Report_push, Launch_info, partitions number of partitions=4
The storage path and directory rules are:
Xxx/message-folder

              |--report_push-0
              |--report_push-1
              |--report_push-2
              |--report_push-3
              |--launch_info-0
              | Launch_info-1
              |--launch_info-2
              |--launch_info-3

In Kafka file storage, there are several different partition under the same topic, each partition as a directory, Partiton named rule is topic name + ordered ordinal number, the first Partiton ordinal number starts from 0, The maximum number of partitions is minus 1.
For multiple broker distributions, refer to the Kafka cluster partition distribution principle Analysis 2.2 Partiton file storage mode

The following schematic image illustrates how the file is stored in partition:

                              Figure 1
Each partion (directory) is equivalent to a mega file being distributed evenly across multiple equal segment (segment) data files. However, each segment segment the number of file messages is not necessarily equal, this feature facilitates the old segment file quickly deleted. Each partiton only needs to support sequential reads and writes, and the segment file lifecycle is determined by the server-side configuration parameters.

The advantage of this is that you can quickly remove unwanted files and effectively improve disk utilization. segment file storage structure in 2.3 Partiton

Readers from section 2.2 Understand the Kafka file system partition storage mode, this section in-depth analysis of Partion segment file composition and physical structure. Segment file composition: composed of 2 large parts, respectively, index file and data file, the 2 file one by one corresponding, in pairs, the suffix ". Index" and ". Log" are represented as segment index files, data files respectively. Segment file naming rules: Partion The first segment of the global starting from 0, followed by each segment file named the offset value of the last message in the previous segment file. The maximum value is 64-bit long, 19-bit digit character length, and no number is populated with 0.

The following file list is the author's experiment on Kafka broker, creating a topicxxx containing 1 partition, setting each segment size to 500MB, and starting producer to write large amounts of data to Kafka broker. The above 2 rules are illustrated by the image of the segment file list shown in Figure 2 below:

            Figure 2

Taking the example of a pair of segment file files in Figure 2 above, the physical structure of the Index<-->data file correspondence in segment is described as follows:

            Figure 3

In Figure 3 above, the index file stores a large amount of metadata, the data file stores a large number of messages, and the metadata in the index file points to the physical offset address of the message in the corresponding data file.
An example of metadata 3,497 in an index file, in which the 3rd message is represented in the data file (at the global Partiton, and No. 368772 messages), and the physical offset address of the information is 497.

As shown in Figure 3 above, the segment data file consists of a number of message details, which are described in detail in the following physical structure:

           Figure 4
parameter Description:
Key Words Explanatory notes
8 byte offset Each message in the Parition (partition) has an ordered ID number, which is called an offset, which uniquely determines the location of each message within the Parition (partition). That is, offset represents the number of partiion
4 byte message size Message size
4 byte CRC32 Verifying message with CRC32
1 byte "Magic" Represents this release Kafka service protocol version number
1 byte "Attributes" Represents a standalone version, or identifies a compression type, or an encoding type.
4 byte key length Indicates the length of the key, and the K byte key field is not filled when key is-1
K byte key Optional
Value bytes Payload Represents the actual message data.
2.4 How to find a message via offset in partition

For example, to read the offset=368776 message, you need to find it by following 2 steps.

The first step is to find segment file
Figure 2 above, for example, where 00000000000000000000.index represents the first file, the starting offset (offset) is 0. The second file 00000000000000368769.index has a message quantity starting offset of 368770 = 368769 + 1. Similarly, the starting offset for the third file 00000000000000737337.index is 737338=737337 + 1, and the other subsequent files are named and sorted by the starting offset, as long as they are found by offset * * * File list, you can quickly navigate to specific files.
Navigate to 00000000000000368769.index|log when offset=368776

The second step is to find the message by Segment file
Navigate to segment file through the first step, and when offset=368776, Navigate to the 00000000000000368769.index physical location of the metadata and the physical offset of 00000000000000368769.log, and then search through 00000000000000368769.log order until offset =368776 so far.

From Figure 3 above, the advantage of this is that segment index file takes the form of sparse indexed storage, which reduces the size of index files, allows direct memory operations through MMAP, and a sparse index sets a metadata pointer for each corresponding message of the data file. It saves more storage space than a dense index, but it takes more time to find it. 3 Kafka file storage mechanism-actual operation effect

Experimental environment: Kafka cluster: from 2 virtual units into Cpu:4 nuclear physics Memory: 8GB network card: Gigabit NIC JVM heap:4gb detailed Kafka server configuration and its optimization please refer to: Kafka server.properties configuration detailed

                              Figure 5                                 

As you can see from Figure 5 above, Kafka runs with very little disk-reading operations, mainly on a regular bulk disk operation, so the operational disk is efficient. This is closely related to the design of the read and write message in the Kafka file store. The read and write message in Kafka has the following characteristics:

Write message messages are transferred from the Java heap to the page cache (that is, physical memory). The asynchronous thread brushes the disk, and the message is brushed from the page cache.

Read message messages are sent directly from page cache to the socket. When the corresponding data is not found from the page cache, disk IO is generated, from the magnetic
Disk load message to page cache, then send it directly from the socket 4. Summary

Kafka efficient file Storage design features Kafka the topic of a parition large file into a number of small file segments, through a number of small file segments, it is easy to periodically clear or delete the consumption of files, reduce disk occupancy. The index information allows you to quickly locate the message and determine the maximum size of the response. By mapping all the index metadata to memory, you can avoid segment file IO disk operations. Sparse storage of index files can significantly reduce the amount of space occupied by the index file metadata. Reference

1.Linux Page Cache mechanism
2.Kafka Official Document


Original address: http://tech.meituan.com/kafka-fs-design-theory.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.