Primary Key Selection in database shard Design

Source: Internet
Author: User
Tags database sharding

In the previous article "database sharding design in youpai network architecture,
I have mentioned the primary key selection problem in the MySQL database sharding design. In this article, I want to discuss this issue,
This serves as a supplement to the previous article.

As mentioned above, Pat net uses a globally unique field as the primary key. Take the photo table as an example,
Although the photo data of different users is stored in different shard (or MySQL node/instance, please refer to "database sharding design in youpai network architecture,
However, each photo has a unique ID of the entire site as the identifier.

Why is it globally unique?

When we resize a database cluster, we need to move data between different Shard to ensure load balancing,
If the primary key is not unique, we cannot move the data at will. At first, we considered using a combination of primary keys to solve this problem.
Generallyuser_idAnd an auto-Incrementphoto_idAs the primary key, this can indeed solve the primary key conflict problem caused by moving data,
But as described in "database sharding design in upyun network architecture", when the data between shard occurs,
We need to use more fields to form the primary key to ensure uniqueness. Therefore, the index of the primary key changes greatly, thus affecting query performance,
It also affects the write performance.

Second, each shard is composed of two MySQL servers, which adopt master-master replication,
To ensure that each shard is always writable. Master-master replication must ensure that the data inserted on the two servers has different primary keys,
Otherwise, duplicate primary key errors will occur when copying data to another instance. If we ensure that the primary key is globally unique, this problem is solved naturally.
In the design where data splitting is not used, if you want to use an auto-increment field, you can refer to the solution in this article.

Possible solutions
  • UUID

You may use UUID as the primary key, but the uuid is a long string, which is ugly in the URL?
Of course, this is not the key. The more important reason is performance. UUID generation is not sequential, so when writing,
You need to randomly change the different locations of the index, which requires more Io operations, especially if the index is too large to be stored in the memory.
During UUID indexing, a key requires 32 bytes (of course, if it is stored in binary format, it can be compressed to 16 bytes ),
Therefore, the entire index is relatively large.

  • MySQL auto-increment Field

In an application of a single MySQL database, you can set an auto-increment field. In the design of horizontal database sharding, this method cannot be globally unique.
We can create a separate database to generate an ID. Each table in the Shard has a corresponding table in this ID database, and the corresponding table has only one field,
This field is auto-incrementing. When we need to insert new data, we first insert a record in the corresponding table in the ID library to obtain a new ID,
Then, use this ID as the primary key for inserting data into the shard. The disadvantage of this method is that additional insert operations are required. If the ID library changes greatly,
Performance will also decrease. Therefore, make sure that the dataset of the ID database is not too large. One way is to regularly clear the previous records.

  • Introduce other tools

Redis and memcached all support atomic increment operations, and because of their excellent performance, they can reduce additional overhead during write operations,
Maybe we can use them as sequence generators. The problem with memcached is that it is not persistent, so we will not consider it.
Redis is not real-time persistent, and can also be configured as real-time, but that's strange. Of course, there are also some persistent tools,
For example, Kyoto cabinet, Tokyo cabinet, MongoDB, etc. It is said that the neutral performance is good, but introducing other tools will increase the complexity of the architecture,
It also increases maintenance costs. Our team is very small and has limited energy. We stick to the principle that we can only use it, that is, there is no special reason,
If acceptable, use the tools we are familiar with to solve the problem. Therefore, let's take a look at how to use MySQL to solve this problem.

Better Solution

At the beginning, we used the MySQL auto-increment field method described above,
Later I saw ticket servers: distributed unique primary keys on the cheap
The methods described in this article are suddenly open. I often think like this: without those open-source products and those who share their experiences,
To what extent can we do with our own capabilities. I am very grateful to those people, so I will share my experience as much as possible.

First, I will describe the methods described in the article on Flickr. They used the replace into MySQL extension function.
Replace into is the same as insert, but when replace into is used to insert new data rows,
If the primary key or unique key (unique key) of the newly inserted row already exists, the existing row will be deleted first, and then the new row will be inserted.
You can rest assured that this is an atomic operation.

Create a table similar to the following:

CREATE TABLE `tickets64` ( `id` bigint(20) unsigned NOT NULL auto_increment, `stub` char(1) NOT NULL default '', PRIMARY KEY (`id`), UNIQUE KEY `stub` (`stub`) ) ENGINE=MyISAM;

To obtain a globally unique ID, run the following SQL statement:

REPLACE INTO `tickets64` (`stub`) VALUES ('a'); SELECT LAST_INSERT_ID();

After this statement is run for the first time, the ticket64 table will contain the following data:

+--------+------+ | id | stub | +--------+------+ | 1 | a | +--------+------+

Run the preceding statement again later. The row with the stub field value 'a' already exists. Therefore, MySQL deletes the row and inserts it again.
Therefore, after the second execution, the ticket64 table still has only one row of data, but the value of the ID field is 2.
This table will have only one row of data.

With the help of photo, group, account, and task, a ticket table is created to ensure the continuity of their IDs.
The ids of other business tables are generated using the same ticket table.

Good, but it can be even better. For example, you only need a ticket table to provide continuous IDs for all business tables.
Next, let's take a look at our method. First, let's take a look at the table structure:

CREATE TABLE `sequence` ( `name` varchar(50) NOT NULL, `id` bigint(20) unsigned NOT NULL DEFAULT '0', PRIMARY KEY (`name`) ) ENGINE=InnoDB;

Note that the ID field is neither auto-incrementing nor primary key. Before use, we need to insert some initialization data:

INSERT INTO `sequence` (`name`) VALUES ('users'), ('photos'), ('albums'), ('comments');

Next, we can execute the following SQL statement to obtain the new photo ID:

UPDATE `sequence` SET `id` = LAST_INSERT_ID(`id` + 1) WHERE `name` = 'photos'; SELECT LAST_INSERT_ID();

We performed an update operation to add 1 to the ID field and pass the added value to the last_insert_id function,
The return value of last_insert_id is specified.

In fact, we do not need to specify the sequence name in advance. If we need a new sequence, we can directly execute the following SQL statement:

INSERT INTO `sequence` (`name`) VALUES('new_business') ON DUPLICATE KEY UPDATE `id` = LAST_INSERT_ID(`id` + 1); SELECT LAST_INSERT_ID();

Here, we use insert... On duplicate key update MySQL extension,
This extended function inserts a new record for a row like insert, but when the primary key or unique key (unique key) of the newly inserted row is the same as the existing row,
Update existing rows.

Note that when we execute the preceding statement for the first time, the insert operation is normal because there is no field named 'new _ business,
No update is executed, so no value is passed for last_insert_id. Therefore, the value returned by executing select last_insert_id () is uncertain,
It depends on the operations performed before the current connection. If the operation is not performed, the value of last_insert_id will be affected, and the return value will be 0,
Otherwise, it is the value generated by this operation. Therefore, we should try to avoid using this method.

Update: This method is easier to solve single point of failure (spof) and is not limited to two servers. You only need to set different initial values (but must be continuous) for different servers ),
Then, you can change the incremental value to the number of servers.

To sum up

I can use that sentence. Of course, it doesn't mean you don't need to understand other products or solutions. Youpai is also using some emerging products,
For example, redis (started to be used in the official environment in March, which is a relatively early user ),
Because its introduction can indeed better, more convenient, and more efficient solve some of our problems.
The key is to have enough knowledge before use. I will introduce the usage of redis in a later article.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.