Kafka file storage mechanism and partition and offset

Last Update:2017-01-13 Source: Internet

Author: User

Tags crc32 message queue zookeeper

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

What's Kafka?

Kafka, originally developed by LinkedIn, is a distributed, partitioned, multiple-copy, multiple-subscriber, zookeeper-coordinated distributed logging system (also known as an MQ system), commonly used for Web/nginx logs, access logs, messaging services, and so on, LinkedIn contributed to the Apache Foundation in 2010 and became the top open source project.

1. Foreword

A commercial message queue performance is good or bad, and its file storage mechanism design is to measure a Message Queuing service level and one of the most critical indicators.

From the Kafka file storage mechanism and the physical structure angle, we will analyze how Kafka can achieve efficient file storage and its practical effect.

2.Kafka file storage mechanism

Kafka some of the nouns are explained as follows:

Broker: Message middleware handles nodes, a Kafka node is a broker, and multiple broker can form a Kafka cluster.

Topic: A type of message, such as Page view log, click Log, can exist in the form of Topic, Kafka cluster can be responsible for multiple Topic distribution at the same time.

Partition:topic physical groupings, a topic can be divided into multiple Partition, and each Partition is an ordered queue.

The segment:partition is physically composed of multiple Segment, which are described in detail in 2.2 and 2.3 below.

Offset: Each partition consists of a sequence of sequential, immutable messages that are appended sequentially to the partition. Each message in the partition has a sequential serial number called offset, which is used to partition uniquely identify a message.

The analysis process is divided into the following 4 steps:

Partition Storage distribution in topic

How to store files in Partiton

Segment file storage structure in Partiton

How to find a message through offset in partition

Through detailed analysis of the above 4 processes, we can clearly understand the mysteries of the Kafka file storage mechanism.

Partition storage distribution in 2.1 topic

Suppose the Kafka cluster in the experimental environment has only one broker,xxx/message-folder for the data file storage root directory, server.properties file configuration in Kafka broker (parameter log.dirs=xxx/ Message-folder), for example, create 2 topic names of Report_push, Launch_info, partitions number of partitions=4

The storage path and directory rules are:

Xxx/message-folder

|--report_push-0

|--report_push-1

|--report_push-2

|--report_push-3

|--launch_info-0

|--launch_info-1

|--launch_info-2

|--launch_info-3

In Kafka file storage, there are several different partition under the same topic, each partition as a directory, Partiton named rule is topic name + ordered ordinal number, the first Partiton ordinal number starts from 0, The maximum number of partitions is minus 1.

For multiple broker distributions, refer to the Kafka cluster partition distribution principle Analysis

How to store files in 2.2 Partiton

The following schematic image illustrates how the file is stored in partition:

Figure 1

Each partion (directory) is equivalent to a mega file being distributed evenly across multiple equal segment (segment) data files. However, each segment segment the number of file messages is not necessarily equal, this feature facilitates the old segment file quickly deleted.

Each partiton only needs to support sequential reads and writes, and the segment file lifecycle is determined by the server-side configuration parameters.

The advantage of this is that you can quickly remove unwanted files and effectively improve disk utilization.

Segment file storage structure in 2.3 Partiton

Readers from section 2.2 Understand the Kafka file system partition storage mode, this section in-depth analysis of Partion segment file composition and physical structure.

Segment file composition: composed of 2 large parts, respectively, index file and data file, the 2 file one by one corresponding, in pairs, the suffix ". Index" and ". Log" are represented as segment index files, data files respectively.

Segment file naming rules: Partion The first segment of the global starting from 0, followed by each segment file named the offset value of the last message in the previous segment file. The maximum value is 64-bit long, 19-bit digit character length, and no number is populated with 0.

The following file list is the author's experiment on Kafka broker, creating a topicxxx containing 1 partition, setting each segment size to 500MB, and starting producer to write large amounts of data to Kafka broker. The above 2 rules are illustrated by the image of the segment file list shown in Figure 2 below:

Figure 2

Taking the example of a pair of segment file files in Figure 2 above, the physical structure of the Index<-->data file correspondence in segment is described as follows:

Figure 3

In Figure 3 above, the index file stores a large amount of metadata, the data file stores a large number of messages, and the metadata in the index file points to the physical offset address of the message in the corresponding data file.

An example of metadata 3,497 in an index file, in which the 3rd message is represented in the data file (at the global Partiton, and No. 368772 messages), and the physical offset address of the information is 497.

As shown in Figure 3 above, the segment data file consists of a number of message details, which are described in detail in the following physical structure:

Figure 4

Parameter description:

Key Words	Explanatory notes
8 byte offset	Each message in the Parition (partition) has an ordered ID number, which is called an offset, which uniquely determines the location of each message within the Parition (partition). That is, offset represents the number of partiion
4 byte message size	Message size
4 byte CRC32	Verifying message with CRC32
1 byte "Magic"	Represents this release Kafka service protocol version number
1 byte "Attributes"	Represents a standalone version, or identifies a compression type, or an encoding type.
4 byte key length	Indicates the length of the key, and the K byte key field is not filled when key is-1
K byte key	Optional
Value bytes Payload	Represents the actual message data.

2.4 How to find a message via offset in partition

For example, to read the offset=368776 message, you need to find it by following 2 steps.

The first step is to find segment file

Figure 2 above, for example, where 00000000000000000000.index represents the first file, the starting offset (offset) is 0. The second file 00000000000000368769.index has a message quantity starting offset of 368770 = 368769 + 1. Similarly, the starting offset for the third file 00000000000000737337.index is 737338=737337 + 1, and the other subsequent files are named and sorted by the starting offset, as long as they are found by offset * * * File list, you can quickly navigate to specific files.

Navigate to 00000000000000368769.index|log when offset=368776

The second step is to find the message by Segment file

Navigate to segment file through the first step, and when offset=368776, navigate to the 00000000000000368769.index physical location of the metadata and the physical offset of 00000000000000368769.log. Then find it in 00000000000000368769.log order until offset=368776.

From Figure 3 above, the advantage of this is that segment index file takes the form of sparse indexed storage, which reduces the size of index files, allows direct memory operations through MMAP, and a sparse index sets a metadata pointer for each corresponding message of the data file. It saves more storage space than a dense index, but it takes more time to find it.

3 Kafka file storage mechanism? actual operation effect

Experimental environment:

Kafka cluster: from 2 virtual units into

Cpu:4 Nuclear

Physical Memory: 8GB

Network card: Gigabit NIC

JVM HEAP:4GB

Detailed Kafka service-side configuration and its optimization please refer to: Kafka server.properties configuration detailed

Figure 5

As you can see from Figure 5 above, Kafka runs with very little disk-reading operations, mainly on a regular bulk disk operation, so the operational disk is efficient. This is closely related to the design of the read and write message in the Kafka file store. The read and write message in Kafka has the following characteristics:

Write message

Messages are transferred from the Java heap to the page cache (that is, physical memory).

The asynchronous thread brushes the disk, and the message is brushed from the page cache.

Read message

Messages are sent directly from page cache to the socket.

When the corresponding data is not found from the page cache, disk IO is generated, from the magnetic

Disk load message to page cache and send directly from socket

4. Summary

Kafka efficient file Storage design features

Kafka Topic a parition large file into a number of small file segments, through a number of small file segments, it is easy to periodically clear or delete the finished files, reduce disk consumption.

The index information allows you to quickly locate the message and determine the maximum size of the response.

By mapping all the index metadata to memory, you can avoid segment file IO disk operations.

Sparse storage of index files can significantly reduce the amount of space occupied by the index file metadata.

Partition and offset in the Kafka

Log mechanism

When it comes to partitioning, say Kafka store messages. In the official document.

Partition read-write log map

First, Kafka logs the message by log. Whenever a message is generated, Kafka is logged to the local log file, which is different from our usual log. Here's a reference to the log, not much explained.

The default location of this log file is specified in Config/server.properties. The default location is Log.dirs=/tmp/kafka-logs,linux Needless to say, Windows is in the root directory of your corresponding disk. I'm here on D. .

Partition partition

Kafka is designed for distributed environments, so if log files, in fact, can also be understood as a message database, put in the same place, it will inevitably bring a decline in availability, a hang all, if the full amount of copy to all machines, then the data has excessive redundancy, and because the disk size of each machine is limited , so even if there are more machines, the messages that can be processed are limited by the disk and cannot exceed the current disk size. So there is the concept of partition.

Kafka a certain calculation of the message, through the hash to partition. In this way, a log file is divided into multiple copies. As the above section read and write log map, divided into many, in a single broker, such as fast on the hands, if the new topic, We chose the--replication-factor 1--partitions 2, then in the log directory, we will see

test-0 directory and test-1 directory. It's two partitions.

You might think, it doesn't make any difference. Note that the meaning exists when multiple broker is present. Here is a picture, the original in the reference link has

Kafka Distributed partitioned storage

This is a topic containing 4 partition,2 Replication (copies), which means that all messages are placed in 4 partition stores, in order to be highly usable, 4 partitions are 2 redundant and then according to the allocation algorithm. A total of 8 data is allocated to the broker cluster.

The result is that each broker is storing less data than a full amount of data, but each data has redundancy, so that once a machine is down, it does not affect the use. Like the Broker1 in the picture, the downtime. So the remaining three broker still retains the full amount of partitioned data. So it can be used, If one more downtime, then the data is incomplete. Of course, you can set more redundancy, for example, set the redundancy is 4, then each machine has 0123 complete data, downtime a few units are OK. It needs to be measured between storage occupancy and high availability.

As for downtime, zookeeper will elect a new partition leader. To provide services. This is the next article

Offsets offset

The last paragraph says partitioning, and partitioning is an orderly, immutable message queue. The new commit log continues to add data to the back. These messages are assigned a subscript (or offset), which is offset, which is used to locate this message.

Which message consumers consume, is kept at the consumer's end. The message can also be controlled by the consumer, who can save the offset of the last message locally, and register the offset intermittently with the zookeeper. You can also reset offset

How to work out partitions by offset

In fact, when partition storage, and then divided into multiple segment (paragraph), and then through a index, index, to identify the first paragraphs. Here you can look at the local log directory of the partition folder.

Here I am, test-0, in this section, there will be an index file and a log file,

Index and log

For a specified partition, suppose that every 5 messages, as a segment size, when 10 messages are generated, are currently available (just to explain)

0.index (indicates that index is indexed to 0-4 here)

5.index (indicates that index is indexed to 5-9 here)

10.index (indicates that index is indexed to 10-15 and is not yet full)

And

0.log

5.log

10.log

, when the consumer needs to read the offset=8, first Kafka the index file list for binary search, can be calculated. It should be in the 5.index corresponding log file, and then the corresponding 5.log files, the order to find,5->6->7-> 8, until you find 8 in the order.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More