One of the most common insert operations can be used to track the Innodb record format, because at the time of insertion, the system obtains a public mysql record format, and now it does not involve any storage engine, the record format is the same regardless of the storage engine corresponding to the current table. For insert, the mysql function corresponds to ha_write_row, which is specific to the Innodb Storage engine, the actually called function is the ha_innobase: write_row function. Here, Innodb first converts the received record to its own tuple, this is actually the expression of innodb corresponding to the record. It is a memory record and a logical record. Before the system writes it to the page, this record is stored in tuple mode. The following mainly describes how Innodb converts a tuple to its physical storage record from the source code perspective, this article mainly studies the implementation logic of the Code and the record format.
This section only describes the format of a Compact record.
Insert a single record in a certain page) the operation function is page_cur_tuple_insert. Its parameter is a tuple of the dtuple_t * type. Here, it first needs to allocate a space to store the physical records to be converted. Therefore, we need to calculate the space size first. The calculation method is as follows:
1. first, each record must contain the following two parts: REC_N_NEW_EXTRA_BYTES + UT_BITS_IN_BYTES (n_null). The preceding part indicates the fixed-length extra part of this format, this part is used to store what content will be given later, and it indicates which fields in all fields have null values. Of course, only the fields of nullable are stored here, if not null is specified during table creation, it will not be stored here. Here, a single bit is used to represent the null attribute of a field. The above part is named as the value of the extra_size variable by the system code.
2. statistics on the length of data in each column. When collecting statistics on this information, there are many other situations, mainly divided into fixed-length fields and variable-length fields. For fixed-length fields, its length is directly the length of the data type, for example, four bytes for the int type, six bytes for the rowid column, and no additional length. For a variable-length field, in addition to the length of the data content, you also need to calculate the storage space of its data length. If the field's literal length is greater than 255 bytes, or if the data type of a field is BLOB, two bytes are required to store the length of the field. If the definition length is less than 128 bytes, or less than 256 bytes, but the data type is not BLOB type, so the data length of this field is stored in one byte. In addition to the above two cases, it is stored in two bytes. In this section, the length of the space used to store the data of variable-length fields is also calculated as extra_size by Innodb.
So now we can know that an innodb record contains two parts: extra_size and data content. The total length of these two parts is the result calculated above, it is defined as record_size.
Next, apply for a space to convert tuples to records.
The conversion function is rec_convert_dtuple_to_rec_new. The parameters have the memory structure of the requested buf, tuples, and indexes.
The first operation here is rec = buf + extra_size. The variable rec indicates the start position of data content storage. Extra_size is the two data parts calculated above.
In this case, the rec_convert_dtuple_to_rec_comp function is called next. The following is the prototype of the function:
- void
-
- rec_convert_dtuple_to_rec_comp(
-
- /*===========================*/
-
- rec_t* rec, /*!< in: origin of record */
-
- ulint extra, /*!< in: number of bytes to
-
- reserve between the record
-
- header and the data payload
-
- (normally REC_N_NEW_EXTRA_BYTES) */
-
- const dict_index_t* index, /*!< in: record descriptor */
-
- ulint status, /*!< in: status bits of the record */
-
- const dfield_t* fields, /*!< in: array of data fields */
-
- ulint n_fields)/*!< in: number of data fields */
Rec indicates the rec variable calculated above, and extra indicates a fixed-length REC_N_NEW_EXTRA_BYTES.
- end = rec;
-
- nulls = rec - (extra + 1);
-
- n_null = index->n_nullable;
-
- lens = nulls - UT_BITS_IN_BYTES(n_null);
-
- /* clear the SQL-null flags */
-
- memset(lens + 1, 0, nulls - lens);
Here, this code is hard to understand at once, so first draw the record storage format:
| --------------------- Extra_size ----------------------------------------- | --------- fields_data ------------ |
| -- Columns_lens --- | --- null lens ---- | ------ fixed_extrasize (5) ------ | -- col1 --- | --- col2 --- | --- col2 ---- |
So what is the result of the statement nulls = rec-(extra + 1? What do you want? Because extra indicates REC_N_NEW_EXTRA_BYTES and fixed_extrasize with a fixed length, rec indicates the start position of col1 in the figure, now we can know that the result of this statement is that nulls points to the starting position of the last byte of nulllens. Now we know that nulls is one or more bytes used to store the empty flag of each nullable field. Why do we point to the starting position of the last byte of this array? It's hard to understand at once, but we can know from the code below that writing nulls is written from the back to the front, so we also understand why it points to the next byte location.
Then the following statement lens = nulls-UT_BITS_IN_BYTES (n_null); the same is true because columns_lens is in front of nulllens. If we skip all spaces of the null flag forward, then, lens is the next byte of columns_lens. When the value is written, it is also written from the back to the front.
The last statement, memset (lens + 1, 0, nulls-lens); indicates that the meaning is clear, because lens points to the starting position of the last byte of columns_lens, then 1 points to the starting position of the nulls space. nulls-lens indicates the length of the nulls space. Here, the nulls space is cleared.
The above two parts are filled with data from the back to the front. Are you worried that the data will cross the border when writing data? Actually, it is not possible, because these are all previously calculated. extrasize is fixed, including the length of nulls and columns_lens.
The above is the initialization work. The record is entered based on each field. The following code processes the null information and processes the following for each field:
- if (!(dtype_get_prtype(type) & DATA_NOT_NULL)) {
- /* nullable field */
- ut_ad(n_null--);
- if (UNIV_UNLIKELY(!(byte) null_mask)) {
- nulls--;
- null_mask = 1;
- }
- ut_ad(*nulls < null_mask);
- if (dfield_is_null(field)) {
- *nulls |= null_mask;
- null_mask <<= 1;
- continue;
- }
- null_mask <<= 1;
- }
From the first line, we can see that the condition to be processed must first be that the not null attribute is not defined, so the nulls space only stores information about these fields.
Row 4th indicates that if(Byte) null_mask)When the value is 0, nulls returns a byte forward and restores null_mask to the initial value of 1. Because the initial value is 1, you can guess that if this condition is met, it indicates that eight nullable columns have been written, so you need to move the first byte to continue writing null information, but it is found that null_mask is of the int type, nulls is a byte that does not match. However, if the condition is (byte) null_mask, the value is 0. For each field, null_mask is executed to shift one digit to the left. Therefore, after eight shifts, the values of 8 digits are 0.
Row 9th indicates that if the data in this column is a null value, the null value needs to be reflected in the nulls array, because the current value of null_mask is actually 1) in fact, it indicates the correspondence between the fields being processed in the current nulls byte. That is to say, if the current field value is null, as shown in row 10th, use null_mask or the nulls byte. If it is not null, the value of the corresponding bit is 0.
Therefore, we can see that bitmaps in the entire nulls space represent the null information of all nullable columns in the forward order.
- if (fixed_len) {
- } else if (dfield_is_ext(field)) {
- *lens-- = (byte) (len >> 8) | 0xc0;
- *lens-- = (byte) len;
- } else {
- if (len < 128 || (dtype_get_len(type) < 256 && dtype_get_mtype(type) != DATA_BLOB)) {
- *lens-- = (byte) len;
- } else {
- *lens-- = (byte) (len >> 8) | 0x80;
- *lens-- = (byte) len;
- }
- }
- memcpy(end, dfield_get_data(field), len);
- end += len;
From the first line, we can see that for a fixed-length data, you only need to write the data into the record, which mainly processes variable-length data, row 2nd indicates that if the length is greater than 256 bytes, or the data type is BLOB, two bytes are used to store the length. Low-byte storage (len> 8) | 0xc0, high-byte storage (byte) len is truncated ). Others can be seen directly.
To 13 rows, data is directly copied to the data storage space, expressed by end. After a field is stored, the next field is stored in the order defined by the index.
Here, the physical conversion of a record logic is complete, and we also know how Innodb stores its physical records.
Conclusion: Looking at the innodb code, we can say that the code is very elegant and refined, so it is difficult to understand it all at once in some places. We need to speculate and try to understand it in depth. At the same time, there are many places where they are directly hard-coded, which makes it more difficult to understand. The best way is to name them through macros, which is helpful for understanding.
Link: http://www.cnblogs.com/bamboos/archive/2013/03/04/2943160.html
Edit recommendations]
- Database Operation example for MySQL learning notes for beginners
- Table operation example for beginners of MySQL learning notes
- MySQL management tips for beginners
- MySQL query example for beginners
- Summary of common operations for beginners on MySQL Study Notes