Primary key selection for InnoDB Engine tables

Source: Internet
Author: User

MySQL adopts an open plug-in storage engine architecture and provides functions similar to power plug-in boards. The storage engine that is subsequently connected is similar to electrical equipment, and the storage we commonly use is dominated by MyISAM and InnoDB, in the early days, we mainly used the MyISAM engine to support businesses. As MySQL supports more and more businesses, the stored data becomes more and more important to enterprises. In particular, the maximum memory supported by PC servers becomes larger and larger, memory prices are getting cheaper, and InnoDB engines are gradually used. the two storage engines with different styles have different internal storage algorithms and data manipulation implementations. In addition, the InnoDB engine is not the same as the storage engines of other commercial database products, therefore, we must design a reasonable data storage structure and data manipulation methods based on the features of the storage engine. This article will focus on the primary key design of the InnoDB Storage engine and tell you how to design the primary key of a table.

Before discussing the requirements for primary key selection for InnoDB Engine tables, let's take a look at the metadata and index data storage structure features of InnoDB Engine tables.

The data and indexes of InnoDB Engine tables are stored in the same file. The default page size of InnoDB engine is 16 KB, and the page space usage is 15/16, therefore, the amount of data stored on each page is limited. The relationship between the primary key and data is that the data is stored in the leaf node of the Cluster Index. Next, let's look at the relationship between the primary key and the data's organizational structure, 1-1:


Figure 1-1

How does one store non-cluster indexes? After translating official articles, we learned that the data stored on each leaf node without a Cluster Index contains the value of the Cluster Index. Then, we can find the corresponding metadata through the value of the Cluster Index, let's continue to take a look at the storage structure of the non-Cluster Index and its relationship with the cluster index. 1-2:


Figure 1-2

Next, let's clarify the three types of cluster indexes in the InnoDB Engine table:

L primary keys are also the cluster indexes mentioned in our article;

L if the table has no primary key but a unique index and the field is defined as non-empty, it is used as a cluster index;

L if the table does not have a primary key or a non-null unique index, a field with a length of 6 bytes is created as the cluster index field by default;

Through the above discussion and analysis, the spatial value and utilization of the data storage page, the storage structure of cluster indexes and non-cluster indexes are clarified, we recommend that you create a primary key for the InnoDB Engine table, there is no need to create a unique index and the field definition attribute is not empty, instead of creating the table structure of the primary key. We will analyze the quality of the excellent primary key: the value range is sufficient and the storage length is shorter, the value uniqueness is easier to compare, and the Value Order is increased. The advantages of the three qualities are analyzed in detail separately.

LIf the value range of the primary key is sufficient and the length is shorter, three advantages will be presented.

(1) For tables with the same number of data rows, the smaller the data capacity, the less disk space occupied. This can save physical or logical Io and reduce memory usage;

(2). The length of a common index will be shorter, which can reduce the physical or logical Io of Data searched through a common index;

(3). Reduce page fragmentation of index data storage to improve page utilization and reduce physical IO;

LUniqueness of a primary key

(1). The primary key value must be unique and not empty;

(2 ). each time a primary key value is inserted or modified, it must be determined whether the same value exists. Therefore, reducing the length of the index value to be compared can improve performance, or indirectly convert it into a way to compare the value to improve its comparison and judgment efficiency;

LThe ordering of primary key value insertion has two advantages:

(1 ). the block and block of primary key storage are ordered, and the block is also ordered. The ordered insertion of primary key values can reduce the block sorting and save Disk Physical IO;

(2). the ordering of primary key values can improve the data recovery speed, increase the server throughput, and save physical IO;

In addition to the data storage structure and relationship discussed above, as well as the theoretical knowledge of the quality required by the primary key field, we also need to consider the business and architecture of the actual production environment and other comprehensive factors, in our actual production environment, four types of attributes may be used as the primary key:

(1). Auto-incrementing sequence;

(2) random value generated by the uuid () function;

(3) unique account name registered by the user. It is a string of 40 characters;

(4) generate self-incrementing values based on a set of mechanisms, such as sequence generators;

Next, we will analyze the advantages and disadvantages of these four types of attributes as the table primary key:

(1 ). auto-incrementing sequence: new values are added in ascending order or ascending order mode. The data type also facilitates primary key value comparison. The storage space is also relatively small and generally set: four-byte int type or eight-byte bigint type. To split data horizontally, you can also set two parameters of the mysqld instance: auto_increment_increment and auto_increment_offset. In addition, the only drawback is that auto-incrementing sequence is a global lock at the table level. In the 5.0 series of large-scale concurrent writes, the bottleneck may occur due to the lock release mechanism, but the 5.1 series has been improved, this problem basically does not exist;

(2 ). UUID () function: the value is random + fixed. The value is unordered, and the value of the same server is 77.8%; the number of characters generated is 36, calculated based on UTF-8 encoding. The storage space occupied is 36 bytes. Supports horizontal data splitting without special settings;

(3 ). use the account name registered by the user, string type. The generation of the value depends on the user input. Therefore, the data is basically unordered and the length of the string is also variable, you can only control the limit of the shortest and maximum length through the previous section. Horizontal splitting is supported without special settings;

(4 ). the sequence generator architecture is similar to auto-incrementing sequence. However, you need to use additional development workload and provide a third-party service to avoid the global lock problem of auto-incrementing sequence words, improve concurrency and better support for horizontal data splitting;

(5 ). scenarios where the architecture of Dual-master replication is probabilistic: when the data on the master server is successfully executed but not replicated to the online backup server, the probability of a problem exists, manual intervention is also required, and there are no simple and reasonable automation methods. None of the above four methods can be avoided;

Through the analysis of the advantages and disadvantages of the four attribute values as the primary key, and the comparison of the excellent qualities required by the primary key described above, if the horizontal split problem is not considered, it will cause extra setup troubles, auto-incrementing sequence is the best choice of primary key fields. If the user's registered account requires uniqueness and is not empty, it can be used as the primary key field. If you consider horizontal splitting, the auto-incrementing sequence generator architecture is used for easy-to-use and reliable implementation. The generated values are the best choice of primary key fields;

End]

What type of field attribute should be used as the primary key? The key core should be storage engine: how to store metadata, how to retrieve metadata, and how to maintain its internal index structure, what is the business we want to implement, and finally jointly determine how we design a table for data storage and How to Write SQL statements for data manipulation, then we decided how to create our indexes based on the business characteristics. We suggest you pay more attention to the internal implementation principles and mechanisms of the InnoDB engine. You can refer to an official document provided by the InnoDB Engine for internal implementation, it also analyzes and focuses on business characteristics.

This article from http://www.mysqlops.com/2011/09/10/innodb-primary.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.