A few days ago, I studied database sharding and database sharding. One of the key issues is the generation of a unique key. If the data table contains 0.1 billion data records, it is constantly increasing, here we need to consider the sub-table and sub-database. Assume that we use hash or the user modulo operation to split the table into 10 tables, with the same structure for each table, there is a primary key ID, so the IDs in the 10 tables must be unique. When using a single table, there is no problem with auto-Growth of data tables. After dividing the table into 10 tables, the database cannot be used for auto-growth.
At this point, I suddenly found that the sequence of the Oracle database is really good. I was still very depressed when I first came into contact with it. This design is really not convenient for MySQL to get the SQL server.
Currently, there are basically three methods for unique IDs.
1. the uuid is used for implementation. It is fast and does not repeat, but the generated ID has no rules.
2. The external ID distribution center is used for implementation. The surviving IDs are short and have rules. The disadvantage is that they depend on the single point of failure (spof ).
3. Make a counter table in the database. The class is the sequence in Oracle.
Most of the following content comes from the network:
UUID
The purpose of UUID is to allow all elements in a distributed system to have unique identification information without specifying the identification information through the central control terminal. In this way, each user can create a uuid that does not conflict with others. In this case, duplicate names are not required during database creation.
A uuid is composed of 16-digit numbers in a 16-digit group (also called 128-bit). Therefore, the total number of UUID is 216x8 = 2128 in theory, it is about 3.4x1038. That is to say, if 1 mb uuid is generated per second, it takes 10 billion years for all UUID to be used up.
The standard UUID type contains 32 16-digit numbers. It consists of up to 36 characters in the format of 8-4-4-4-12. Example:
-
550e8400-e29b-41d4-a716-446655440000
Twitter's Snowflake (ID Distribution Center)
Snowflake is an open-source ID generation server for Twitter that is applicable to distributed environments. The generated ID is 64 bits, satisfying both high performance (> 10 k IDS/S), low latency (<2 ms) and high availability. Similar to MongoDB objectid, the IDS generated here are time-ordered. The encoding method is similar to objectid, as shown below:
0 41 51 64+-----------+------+------+|time |pc |inc |+-----------+------+------+
- The first 41bits is the timestamp in microseconds.
- Then 10bits is the machine ID configured in advance.
- The last 12bits is the accumulate counter.
MongoDB objectid (similar to UUID)
Each record in MongoDB has'The id' field uniquely identifies this record. If the user does not display the provided information when inserting data'Id' field, the system will automatically generate one. Objectid has a total of 12 bytes. It is designed to fully consider the use of the distributed environment, so it can be unique in a distributed MongoDB cluster. The objectid format is as follows:
0 4 7 9 12+--------+------+----+------+|time |pc |pid |inc |+--------+------+----+------+
- The first four bytes are UNIX timestamp.
- The next three bytes are the first three bytes of the MD5 result of one of the "Hostname/MAC address/virtual number" of the current machine.
- The next two bytes are the PID of the current process.
- The last three bytes are the accumulate counter or a random number (the random number is used only when the accumulate counter is not supported ).
The last generation is still a string in hexadecimal notation, such47cc67093475061e3d95369d
. Here, the objectid of MongoDB has a great advantage over the uuid, that is, the objectid is time-ordered. In addition, objectid itself also contains a lot of other useful information, which can be directly obtained by directly decoding objectid.
Ticket Server (Database survival mode)
This is the method used by Flickr to generate a global ID. Using the features of auto_increment in the database and the unique features of MySQLREPLACE INFO
Command to generate an ID for a database instance. The general process is as follows:
- First, create a table, for example, to generate 64bitsid, called 'ticket64 ′
CREATE TABLE `Tickets64` ( `id` bigint(20) unsigned NOT NULL auto_increment, `stub` char(1) NOT NULL default '', PRIMARY KEY (`id`), UNIQUE KEY `stub` (`stub`)) ENGINE=MyISAM
- After inserting a record into it, it is roughly like this:
+-------------------+------+| id | stub |+-------------------+------+| 72157623227190423 | a |+-------------------+------+
- When a 64 bits ID is required, run the following SQL statement:
REPLACE INTO Tickets64 (stub) VALUES ('a');SELECT LAST_INSERT_ID();
In addition, you can set two Ticket Server instances to prevent single point of failure (spof) of the ticket server. One generates an odd ID and the other generates an even ID.
TicketServer1:auto-increment-increment = 2auto-increment-offset = 1TicketServer2:auto-increment-increment = 2auto-increment-offset = 2
The application requests two servers alternately, which reduces both the pressure and Fault Risk by half. However, there is also a problem here, that is, when one machine fails, the ID generated by another normal machine will lead the faulty machine forward, and may not be a sequential ID in time. According to Flickr, this does not affect their applications.
Instagram (UUID)
Instagram needs to split the images stored in it into multiple PostgreSQL instances. The ID generation scheme is similar to that of MongoDB objectid. The length of the entire ID is 64 bits, which is set to optimize the storage in redis. The ID encoding format is as follows:
- 41bits is a timestamp in microseconds, starting from.
- 13bits indicates the shard ID of the logical shard.
- 10bits indicates an accumulation counter.
The ID generation logic is written to the PostgreSQL database in PL/pgsql. Each time data is inserted, it is automatically calculated and generated by the database.