Myrocks Record Format analysis

Last Update:2016-11-09 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Overview

Rocksdb as a KV storage engine, the Myrocks records are eventually stored in the ROCKSDB as KV. Tables in MySQL typically consist of several indexes, and in the InnoDB storage engine, each index corresponds to a B-tree, whereas in the Rocksdb storage engine, the index corresponds to a contiguous range of data in Rocksdb.
Specifically, this range is all the data between this index ID and id+1. If all the indexes of a table are in a column family, the index data of the table is basically contiguous in physics.
You can refer to the illustrations in the previous article

Myrocks record Format

Myrocks stores all the indexes of a table in Rocksdb, in indexes.
The format of the Myrocks record differs depending on the type of index. The following table shows different index types for each

CREATE TABLE T1 (a INT, b VARCHAR ( -), CChar(5), Dint, PK INT auto_increment, PRIMARY KEY (PK) Comment'cf_1', Unique key idx2 (b) Comment'cf_2') Engine=Rocksdb;insert into T1 (pk,a,b,c) VALUES (1,1,'bbbbbbbbbb','C');

Primary key
The primary key index records the KV structure as follows
```
key:index_id, M (PK) value:unpack_info, NULL-bitmap,b,c,d
```
Key consists of an index ID and a primary key. INDEX_ID is the unique identifier of the index occupies 4 bytes, M (PK) represents the data after the PK conversion, the converted data can be directly used for memcmp comparison

Rocksdb data are sorted according to key, in order to facilitate comparison, different types of data will undergo some conversion, after conversion can be directly used for memcmp comparison.
For memcmp conversions, the next section will explain

Value stores Unpack_info and non-primary key field data, Null-bitmap identifies which fields are empty.
Unpace_info stores the information that the M (PK) is reversed to PK, unpace_info is null if no additional conversion information is required, in this case the PK is of type int, no additional information is required unpace_info null
Second-level index IDX2
The second-level index records the KV structure as follows
```
key:index_id,null-byte, M (b), M (PK) Value:unpack_info
```
The key consists of a index_id, a two-level index key, and a primary key, where Null-byte indicates whether B is empty. PK primary key is not empty, so do not need null-byte
Value is only Unpack_info, which represents M (b), M (PK) inverse conversion information, and unpace_info is null if no additional conversion information is required. In this example, B is a varchar type and additional information is required unpace_info NOT NULL
There is no difference between a unique index and a normal two-level index storage method
Federated indexes Each field adds a null-byte to the field to indicate whether this field is empty

memcomparable format

Rocksdb for convenience, convert the key field into a form that can be directly memcmp compared. So myrocks generally recommends using sensitive collations (Latin1_bin, utf8_bin, binary).
This avoids the cost of conversion.

Plastic

Plastic conversion is simple, but special handling is required for signed types, and if direct storage causes comparisons to be negative than positive numbers.
The way to handle signed types here is to reverse the sign bit, so that positive numbers are bigger than negative numbers.
The key code snippet is as follows

Field_long::make_sort_key:if(!table->s->Db_low_byte_first) {  if(unsigned_flag) to[0] = ptr[0]; Elseto[0] = (Char) (ptr[0] ^ -);/*revers signbit*/to[1] = ptr[1]; to[2] = ptr[2]; to[3] = ptr[3]; }

Character type

Char Type direct fill space

varchar types are much more complicated to save space.
Take annotations in source code as an example

Const intVarchar_cmp_less_than_spaces =1;Const intVarchar_cmp_equal_to_spaces =2;Const intVarchar_cmp_greater_than_spaces =3; Example:ifFpi->m_segment_size=5, and the collation isLatin1_bin:'abcd\0'= ['ABCD'<VARCHAR_CMP_LESS>] [' /'<VARCHAR_CMP_EQUAL> ]  'ABCD'= ['ABCD'<VARCHAR_CMP_EQUAL>]  'ABCD'= ['ABCD'<VARCHAR_CMP_EQUAL>]  'abcdzzzz'= ['ABCD'<varchar_cmp_greater>]['ZZZZ'<varchar_cmp_equal>]

The string is stored in m_segment_size fragment, the first m_segment_size-1 character of each paragraph is the content, the last character is represented by a space comparison, and the varchar_cmp_equal also indicates the end of the string

In the example, M_segment_size is 5 and the actual implementation value is 9.

Here Unpace_info will be more complex, the string collation different unpace_info also different, unpace_info need to save the transformation between collation mapping relationship,
You can see the function in detail (rdb_init_collation_mapping)

ROCKSDB Internal record format

What we saw earlier is the KV structure recorded before entering the ROCKSDB, in fact the data is stored to ROCKSDB after the key is further encapsulated
The key before entering Rocksdb is called UserKey, rocksdb internal is called Internalkey

internalkey=| User key (string) | Sequence Number (7 bytes) | Value type (1byte) |

Where sequence number is the record sequence, each record sequence numbers is incremented according to the order in which records enter ROCKSDB.
Sequence number is the key to implementing ROCKSDB transaction processing, this next discussion.

Value type is the type of record, put, merge,delete, etc.

Example

Use examples to illustrate the more intuitive, or the table described above, insert a record, to see the specific structure of the record

INSERT into T1 (pk,a,b,c) VALUES (1,1,'bbbbbbbbbb','C  ');

View PRIMARY key index_id is 260, Level two index index_id is 261

 select  * from  information_schema. ROCKSDB_DDL where  table_name= " t1   " ; TABLE_SCHEMA table_name partition_name index_name column_family index_number index_type KV_FORMAT_ VERSION cftest T1 NULL PRIMARY  2  260  1  11   Cf_1test T1 NULL idx2  3  261  2  11  cf_2

Primary KEY record
- Key

- Value

Second-level index records
- Key

- Value

Here is the space information for field B and the collation transformation mapping relationship. More complex, non-detailed expansion, interested in the ability to view functions (rdb_init_collation_mapping)

Myrocks Record Format analysis

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Myrocks Record Format analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Myrocks Record Format analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support