MySQL PRIMARY key design

Source: Internet
Author: User
Tags bulk insert uuid

Original: MySQL primary key design

[TOC]
In the process of the project encountered a seemingly very basic problem, but in-depth thinking has led to a lot of problems, I think it is necessary to take this learning process to record.

MySQL primary key design principles
    • The MySQL primary key should not be meaningful to the user.
    • The MySQL primary key should be single-column to improve the efficiency of the connection and filtering operations
    • Never update MySQL primary key
    • MySQL primary key should not contain dynamically changing data, such as timestamp, creation time column, modified time column, etc.
    • The MySQL primary key should have a computer automatically generated.
Common scenarios for primary key design self-increment ID

Advantages :

1, the database automatic numbering, fast, but also incremental growth, clustered primary key in order to store, for the search is very advantageous.

2, digital type, occupies a small space, easy to sort, in the process of transmission convenience.

Disadvantages :
1, does not support the horizontal shard architecture, the horizontal shard design, this method obviously cannot guarantee the global unique.
2. Table lock

Before MySQL5.1.22, the InnoDB self-increment is obtained by its own self-growth counter, which is accomplished through the table lock mechanism (Auto-inc LOCKING). The lock is not released after each transaction completes, but is released after the SQL statement that is inserted into the self-growth value is completed, and waits for its release for subsequent operations. For example, when the table has a auto_increment field, InnoDB will save a counter in memory to record the value of auto_increment, when inserting a new row of data, a table lock will be used to lock the counter until the end of the insertion. If a large number of concurrent insertions, table locks can cause SQL congestion.
After 5.1.22, InnoDB introduced the parameter Innodb_autoinc_lock_mode in order to solve the problem of self-increment primary key lock table:

    • 0: By the way of table lock, that is, all types of insert are used auto-inc locking (table lock mechanism).
    • 1: Default value, for simple insert self-growth value generation uses mutexes to accumulate in-memory counters, and for bulk INSERT, use table locks in the same way.
    • 2: For all insert-like self-growth value generation using mutex mechanism to complete, the highest performance, concurrent insertion may lead to self-increment discontinuity, may lead to statement Replication inconsistent, using this mode, you need to use the Row The replication mode.

3, self-increasing primary key is not continuous

Create Table: CREATE TABLE `tmp_auto_inc` (  `id` int(11) NOT NULL AUTO_INCREMENT,  `talkid` int(11) DEFAULT NULL,  

When inserting 10 records, the primary key will be discontinuous the next time you insert it because of auto_increment=16.

Uuid

Advantages
1, global uniqueness, security, portability.

2, can guarantee independence, the program can be migrated between different databases, the effect is not affected.

3, ensure that the generated ID is not only the table independent, but also the library independent, when you split the database is particularly important

Disadvantages
1, for the InnoDB engine will increase the IO pressure, InnoDB for the aggregation of the primary key type of engine, the data will be sorted by the primary key, due to the disorder of the UUID, InnoDB will produce huge IO pressure. InnoDB primary key index and data storage location related (cluster index), the UUID primary key may cause the data position to change frequently, severely affects the performance.
2, the UUID length is too long, a UUID occupies 128 bits (16 bytes). The primary key index keylength is too large to affect the number of memory-based index records, which in turn affects the memory-based index hit ratio, and the performance of index queries based on hard disks is poor. Severely affects the overall performance of the database server.

Custom Sequence Table

The so-called custom sequence table, is to build a table in the library for generating sequences to store sequence information, the strategy of sequence generation is implemented through the program level. Build a list of sequences as shown below:

CREATE TABLE `sequence` (    `name` varchar(50) NOT NULL,    `id` bigint(20) unsigned NOT NULL DEFAULT '0',    PRIMARY KEY (`name`)) ENGINE=InnoDB;

Note that the ID field is not self-increasing, nor is it a primary key. Before using it, we need to insert some initialization data first:

INSERT INTO `sequence` (`name`) VALUES ('users'), ('photos'), ('albums'), ('comments');

Next, we can get a new photo ID by executing the following SQL statement:

UPDATE `sequence` SET `id` = LAST_INSERT_ID(`id` + 1) WHERE `name` = 'photos';SELECT LAST_INSERT_ID();

We performed an update operation, increased the ID field by 1, and passed the incremented value to the LAST_INSERT_ID function, specifying the return value of the last_insert_id.

In fact, we don't necessarily need to specify the name of the sequence beforehand. If we need a new sequence now, we can execute the following SQL statement directly:

INSERT INTO `sequence` (`name`) VALUES('new_business') ON DUPLICATE KEY UPDATE `id` = LAST_INSERT_ID(`id` + 1);SELECT LAST_INSERT_ID();

The problem with this scheme is that the logic of sequence generation is divorced from the database layer and is the responsibility of the application layer, which increases the complexity of development. Of course, it is possible to solve this problem with spring because the sequence generation logic has been simply encapsulated in spring JDBC.
We can take a look at spring's related source code: Mysqlmaxvalueincrementer.

    @Override protected synchronized long Getnextkey () throws DataAccessException {if (This.maxid = = This.nexti            D) {/* * need to use straight JDBC code because we need to make sure that the insert and select  * is performed on the same connection (otherwise we can ' t is sure that last_insert_id () * Returned the            Correct value) */Connection con = datasourceutils.getconnection (Getdatasource ());            Statement stmt = null;                try {stmt = Con.createstatement ();                Datasourceutils.applytransactiontimeout (stmt, Getdatasource ());                Increment the sequence column ...                String columnName = getColumnName (); Stmt.executeupdate ("Update" + getincrementername () + "set" + columnName + "= last_insert_id (" +                ColumnName + "+" + getcachesize () + ")");     Retrieve the new max of the sequence column ...           ResultSet rs = stmt.executequery (value_sql); try {if (!rs.next ()) {throw new Dataaccessresourcefailureexception ("Last_inser                    T_ID () failed after executing an update ");                } THIS.MAXID = Rs.getlong (1);                } finally {Jdbcutils.closeresultset (RS);            } This.nextid = This.maxid-getcachesize () + 1; } catch (SQLException ex) {throw new Dataaccessresourcefailureexception ("Could not obtain Last_            INSERT_ID () ", ex);                } finally {jdbcutils.closestatement (stmt);            Datasourceutils.releaseconnection (Con, Getdatasource ());        }} else {this.nextid++;    } return This.nextid; }

The implementation of spring is to increment the ColumnName column in the Incrementername table through the UPDATE statement and return the most recently generated value through MySQL's last_insert_id (). and ensure the concurrency of transactional and method support. Just this implementation is somewhat too simple, for example: a table corresponding to a sequence of the practice in the actual application development is too fragmented, so in the actual application of its implementation needs to be modified to achieve a record corresponding to a sequence of the strategy. In addition, support for horizontal shards is not considered within this implementation. At the same time, this approach still cannot avoid the mechanism of the table lock, so here through the practice of CacheSize (), the implementation of a request and cache in memory in order to reduce the frequency of table lock occurrence.

How to solve the requirement uuid of horizontal shards

Since the probability of the repetition of the UUID can be negligible, it is inherently supported for shards.

Stand-alone sequence library

Create a separate library to generate the ID, each table in Shard has a corresponding table in this ID library, and this corresponding table has only one field, this field is self-increment. When we need to insert new data, we first insert a record in the corresponding table in the ID library to get a new ID, and then use that ID as the primary key for the data inserted into the Shard. The disadvantage of this method is that additional insertions are required, and if the ID library becomes large, the performance decreases. So be sure to ensure that the data set of the ID library is not too large, one way is to periodically clean up the previous records

Compound identifiers

This is done by combining the policy of the primary key, which is to generate a unique identity through two fields, the first half being the Shard identifier, and the second half being the locally generated identifier (for example, using auto_increment generation)

Custom sequence table with split-Library policy

This approach can be based on the above-mentioned method of customizing the sequence table , making some technical adjustments. This is the following:

UPDATE `sequence` SET `id` = LAST_INSERT_ID(`id` + 1) WHERE `name` = 'photos';SELECT LAST_INSERT_ID();

The ID initializer here requires different shards to take different values and must be contiguous. Set each increment step to the number of servers at the same time.
For example, there are 3 machines, so we just set the initial values to three-to-one. Then execute the following statement:

UPDATE `sequence` SET `id` = LAST_INSERT_ID(`id` + 3) WHERE `name` = 'photos';SELECT LAST_INSERT_ID();

This resolves the issue of primary key generation conflicts. However, if you want to dynamically expand the number of shards after running for a period of time, you need to make an adjustment to the sequence's initial values to ensure continuity, otherwise there may be conflicts. Of course, these logic can be encapsulated in the code of the data access layer.

The necessity of the primary key

Each row in the table should have a column (or set of columns) that uniquely identifies itself. Although primary keys are not always required, most database designers should ensure that each table they create has a primary key for future data manipulation and management. In fact, even if you do not build the primary key, the MySQL (InnoDB engine) will also create a hidden 6-byte rowid as the primary key column, detailed can see [here]

Because the InnoDB engine uses a clustered index, the data record itself is stored on the leaf node of the primary index (one b+tree). This requires that each data record in the same leaf node (the size of a memory page or a disk page) be stored in the primary key order, so that whenever a new record is inserted, MySQL inserts it into the appropriate node and position according to its primary key, if the page reaches the load factor (InnoDB defaults to 15/16). Opens a new page (node)

Therefore, when using the InnoDB table, avoid random (discontinuous and very large range of values) clustered indexes, especially for I/O intensive applications. For example, from a performance perspective, a scenario that uses UUID causes the insertion of a clustered index to become completely random.

Data type selection for primary key

The most common argument about the type selection of primary keys is the problem of integer or character, and there is a clear assertion in the book "High Performance MySQL":
Integers are often the best choice for identifying columns because they are fast and can use auto_increament, and if possible, avoid using string types as identity columns because they are very space-consuming and often slower than numeric types.

If you use MyISAM, you will not be able to use the character type, because MyISAM defaults to a compression engine for character types, which results in very slow queries.
Reference:
1, http://www.cnblogs.com/lsx1993/p/4663147.html
2, http://www.cnblogs.com/zhoujinyi/p/3433823.html
3, http://www.zolazhou.com/posts/primary-key-selection-in-database-partition-design/
4, "High-performance MySQL"
5, "high-availability MySQL"

MySQL PRIMARY key design

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.